Steganography: from its origins to the present
The term steganography refers to a technique that aims to hide communication between two interlocutors. The term is composed precisely of the Greek words στεγανός (covered) and γραφία (writing). Unlike encryption, which allows you to encrypt a message so as to make it incomprehensible if you do not have a key to decipher it, steganography aims to keep the very existence of the message away from prying eyes, by hiding it.
The origins
Traces of steganography already existed in ancient Greece, when Herodotus narrated two examples in his Stories, but the first recorded use of the term was in 1499 by Johannes Trithemius in his Steganographia, a treatise on cryptography and steganography, disguised as a book about magic. Initially the author decided not to print it and even destroyed large parts of it, believing that they should never have seen the light of day, but the text continued to circulate in the form of a provisional draft and was published posthumously in 1606.
Since then, many throughout history have used this technique to deliver messages safely. For example, it is known that during both world wars, female spies used knitting to send messages, perhaps making an irregular stitch or leaving an intentional hole in the fabric.
Steganographic models and techniques
In steganography there are two types of messages: the first being “container” message and the second being secret message, where one respectively has the task of hiding the contents of the other, so as to make it invisible to any eavesdroppers. Generally, hidden messages appear to be (or are part of) something else: images, articles, lists or other cover text. For example, the hidden message may be invisible ink between the lines of a private letter.
Essentially there are two main steganographic models: injection steganography and generative steganography. Injective steganography is the most used, it consists of inserting (injecting) the secret message into another message that acts as a container, so as not to be visible to the human eye and to be practically indistinguishable from the original. Generative steganography consists instead of taking the secret message and building a suitable container around it, so as to hide it in the best possible way.
As far as techniques are concerned, the substitute one is undoubtedly the most widespread, so much so that often when we talk about steganography we implicitly refer to that model. At the base of this technique one can observe: most communication channels (telephone lines, radio transmissions, etc.) transmit signals which are always accompanied by some kind of noise. This noise can be replaced by a signal – the secret message – which has been transformed in such a way that, unless you know a secret key, it is indistinguishable from the actual noise, and therefore can be transmitted without arousing suspicion.
Modern steganography
In 1985, personal computers began to be used for classical steganography applications. Further development has been rather slow, but a large number of steganography software exists today. Being a form of security through secrecy, the steganography algorithm, unlike a cryptographic algorithm, must take into account the plausible form that the generated data must have, so that they do not cause suspicion.
In digital steganography, electronic communications can include steganographic encoding within a transport layer, such as a document file, an image file, a program or a protocol. Multimedia files are ideal for steganographic transmission because of their large size. For example, a sender might send a harmless image file and adjust the color of one pixel in a hundred to match an alphabetic character. The change is so subtle that someone is unlikely to notice it unless they are specifically looking for it.
Today steganography therefore presents itself as an ideal tool for the creation of secret communication channels, which can be used in sophisticated scenarios of espionage, computer crime and violation of privacy of both public and private subjects.
Defend yourself from steganography: steganalysis
Steganalysis is the reverse process of steganography. It aims to determine whether a file or any other means that can carry information contains a secret message and if the outcome is positive, find out what the hidden information is. The effectiveness of steganalysis techniques is strictly dependent on the degree of sophistication and “personalization” of the steganographic techniques used by an attacker.
It is easy to see that we are in a vicious circle that provides for an increase in the sophistication of the techniques and tools used both by those who intend to use steganography, and by those who instead intend to unmask it and reveal its hidden contents. Between the two profiles, in general the first figure has an advantage, since he will be able at any time to change the means of transmission and / or coding of the information to escape detection.
The role of machine learning
In this scenario, machine learning can be a sophisticated weapon at the service of those who intend to unmask steganography. Through machine learning techniques it is possible to automatically develop a steganalysis model starting from a set of file samples with and / or without steganography.
However, it is important to underline that machine learning (and more generally artificial intelligence) is a neutral technology. Which means that specifically it is of dual use and does not belong to the domain of the “good”. In fact, machine learning can also be used to develop more sophisticated, polymorphic, data-based steganographic techniques.
We need to prepare, because this scenario could represent the future of cyber threats and perhaps a piece of that future is already present today.