Steganography Concepts

 

By

 

David Burris, Ph.D., CCP, CSP

 

Sam Houston State University

 

Center for Digital Forensics

 

© Copyright 2005-2009

Steganography is the practice of disguising a message (text, picture, video, auditory, olfactory, etceteras) within the context of another message, picture, or communications so it will not be noticed by those for whom it is not intended. Cryptography assumes a message will probably be intercepted and prevents unauthorized access by jumbling the message in a manner hopefully only the intended user can extract the information content. Steganography assumes the message can or will be intercepted but the hidden message will be over looked. For additional security, a message could be encrypted prior to being hidden.

 

Steganography is a "camouflages" technique making the existence of the actual message "invisible" to all but the intended recipient. Encrypted messages tend draws notice hence due to their vary appearance encouraging attempts to crack the code while an invisible message passes without detection.

 

Steganography is based on the principle of hiding a message in plain sight. The message container is so routine or nondescript to the opposition, they fail to notice the hidden content.

 

Steganography was widely used in World War II. Consider the following example of a null cipher (unencrypted messages) used by a German spy in World War II [David Kahn, The Codebreakers, The Macmillan Company. New York, NY 1967].

 

 

Apparently neutral's protest is thoroughly discounted and ignored. Isman hard hit. Blockade issue affects pretext for embargo on by products, ejecting suets and vegetable oils.

 

The following message may be obtained by taking the second letter form each word and a little manipulation:

 

Apparently neutral's protest is thoroughly discounted and ignored. Isman hard hit. Blockade issue affects pretext for embargo on by products, ejecting suets and vegetable oils.

 

Pershing sails from NY June 1.

 

The idea is to make the message so innocuous yet reasonable, even a filter searching for unusual message content will pass the message. Mail filters have been employed to detect open text messages by listening for their "sound." The grammar checkers commonly found in today's word processors could be used as message filters by identifying mail with poor grammar for additional inspection.

 

Additional examples of steganography may be found in sources such as Neil F. Johnson, Zoran Duric, and Sushil Jajodia, "Information Hiding: Steganography and Watermarking - Attacks and Countermeasure." Other source frequently quoted include examples Warren Zevon, Lawyers, Guns, and Money, Music track released in the albums Excitable Boy, 1978; Stand in the Fire, 1981; A Quiet Normal Life, 1986; and Learning to Flinch, 1993.

 


Word processors are an abundant source for new methods to apply steganography. Word processors normally allow for multiple fonts including fixed space (mono spaced) and proportional. In fixed space, all characters occupy the same space. In proportional fonts, the space occupied by the characters is minimized to just what is needed. Proportional fonts tend to be more pleasing to the eye. As an example compare the word "illegal" represented as follows:

 

illegal

In fixed space font Courier

illegal

In proportional font Times Roman New

 

The difference in spacing between font suggest interesting methods for hiding text. Consider the following paragraph shown first in 14 point Courier followed by 14 point Times New Roman.

 

"A study of religion must include the  use  of  the shrines  important to the religious practice. One should also consider how  money  is collected to support the religion. Every  drop  of knowledge must be scrutinized." Courier 14 point)

 

"A study of religion must include the  use  of  the shrines  important to the religious practice. One should also consider how  money  is collected to support the religion. Every  drop  of knowledge must be scrutinized." (Arial 14 point)

 

Assuming your browser software supports the indicated fonts, the first copy of the above message in Courier should occupy considerably more space as it is a mono spaced font. The second copy of the message appears in Times New Roman which is a proportional spaced font. A close examination of the above message will show that an extra space has been placed before and after some phrases as highlighted below.

 

"A study of religion must include the  use  of  the shrines  important to the religious practice. One should also consider how  money  is collected to support the religion. Every  drop  of knowledge must be scrutinized." (Arial 16 point, a proportional font)

 

The hidden message is "use the shrines money drop." In a larger selection of text the extra spaces are hard to detect even if fixed spacing is used. If an appropriate proportional space font is selected the message is especially hard to detect as the space taken by each "space" is squeezed to improve readability. If the message is in electronic format, copying the message into a word processor, switching to a fixed space font, and turning on the switch to indicate the presence of spaces and special characters greatly enhances the ability to detect and read the message. This technique would be termed a variation of word or phrase shifting.

 


Another idea for using a word processor to hide text messages follows. Using a routine transmission, scan left to right selecting the letter in the hidden message one at a time. When selected, change the font of the letter (digit, etceteras) to a closely related font that cannot be distinguished by the eye (there will usually be several choices). Now write a program to filter word processing documents assembling the message from characters not in the base font. Alternately, move the cursor across the text in a word processor and watch for changes in the font type on the menu bar.

 

"The study of religion and love is complex.You must never underestimate the power of your religious beliefs."

(container in Times New Roman 16 point)

 

The hidden message hidden above in the 16 point Times New Roman text is "love your spouse" as follows:

Key:

religion -"l" in Arial 14 point

religion - "o" in Arial 14 point

love - "v" in Arial 14 point

You - "You" in Arial 14 point

never - "r" Arial 14 point

underestimate - "s" Arial 14 point

power - "po" Arial 14 point

your - "u" CG Arial 14 point

religious - "s" in Arial 14

beliefs - "e" (first) in Arial 14 point

 

"The study of religion and love is complex. You must never underestimate the power of your religious beliefs."

 

 

Null Ciphers depend on the fact the message is sufficiently camouflaged to not be noticed if intercepted. Indeed, the assumption is normally the message will be intercepted by censors but the message not noticed. It may be desirable to encrypt the message. Depending on the encryption scheme, messages may be harder or easier to hide.


Available Techniques:

One appealing aspect of steganography is the number of available techniques is limited only by the imagination of the users. Cryptography has rules. Steganography only depends on the cleverness of the users. Examples include:

 

1) Disappearing Ink: In World War II, messages where included in normal correspondence by writing the hidden message between printed lines of text using milk, vinegar, fruit juices, and urine. These inks have the advantage of being readily available in the field, are colorless when dry, yet reappear when subjected to heat. More sophisticated inks require more complex chemistry akin to developing photographs.

 

2) Music: As an example, a "yes," "no," or signal to start might be communicated by the way a popular instrumental or song is initiated or terminated. An innocent phrase at the start or end of any communications such as a radio broadcast might be used to communicate information. Music is also an excellent vehicle for hiding complex drawings using the techniques described below (8) for using picture files as containers for other files.

 

3) Microdots: Microdots were developed by the Germans in World War II. The allies first discovered their use masquerading as a period on a typed envelope carried by a German spy in 1941. The original microdots were photographs about the size of a typed period and carried about a page of information. In addition to textual information, they could accommodate technical drawings and photographs. J. Edgar Hoover, Director of the FBI, has been quoted as referring to microdots as "the enemy's masterpiece of espionage." Microdots are small enough the information they contain can be transmitted without encryption or other means of concealment. The idea is they are so small, they will not be noticed when included as part of normal communications.

 

4) Tattoos: Tattoos have been employed for communications. Either on body parts not normally seen or that can be disguised, e.g., use of makeup. History records instances of shaving a servant or slaves head and applying a tattoo. After the hair had grown, the servant or slave became a living message carrier. The hair was shaved at the destination to retrieve the message.

 

5) One of the earliest recorded uses of steganography is Demeratus notifying Sparta that Xerxes planed to invade Greece. At the time, text was recorded on wooden tablets covered with wax. Demeratus had the wax removed from the wood. After writing the message on the wood, the tablets were again coated with wax. Sentries allowed the tablets to pass with other goods as they appeared to have never been used. Similar techniques can be used with digital media.

 

6) Use of more sophisticated techniques such as phase shift in radar, sound, light, or electrical waves. While physically different from the methods described below to hide graphics in graphics (8), the principals are the same. Both analog and digital media may be utilized to hide information.

 

7) Drawings: Information has been concealed in drawings. Information carriers may be original engineering drawings, original art, or modified copies of popular art or engineering drawings. Lines may be thickened, shortened, or have their colors modified slightly. Some times the actual letters of the message have been integrated into the design.

 

8) Computer media including graphics, sound, and apparently blank media. Consider graphics (pictures) , we normally take advantage of the fact pictures are stored as an array of pixels. Each pixel has an RGB value (red, green, blue) stored as three consecutive 8 bit numbers (frequently referred to as bytes). The three shades of primary color are seen by the eye as a single blended color. At 8 bits per pixel color, there will be 28 equal 256 shades of each of three primary colors. The number 0 represents the lack of that shade (no color) and 255 is the maximum intensity for the shade. Hence the most intense red pixel would be indicated by the RGB value (255,0,0), green by (0, 255, 0) and blue by (0, 0, 255). Pink (a shade of pink) is represented by an RGB value of (255, 175, 175). For a more detailed discussion of image formats, saving space by reducing color depth, and example of manipulating individual primary color (RGB) values in individual pixels, consider DigitalImages.html.

 

Most eyes probably cannot differentiate between 256 shades (no color to maximum color) of the same color. The maximum value for red would be the bit pattern "11111111" base 2 or "255" base 10. If we change the low order bit to zero, the result is "11111110" base 2 or "254" base 10. Your eye will not be able to detect this difference. The same would be true for the absence of the color red "00000000" base 2 or "0" base 10. If we replace the low order bit with a one, the resulting shade is "00000001" which will not be detected by the eye. Remember, this byte is combined with two more primary colors and the eye sees the mixture of red, green and blue. It is even more difficult to detect shading variations using the combination of RGB primary colors when a pixel is surrounded by other pixels as in a graphic (picture).

 

A typical picture today consists of 1024 by 768 pixels or higher. If the low order bit of one color is used on each pixel then a total of 1024 x 768 x 1 = 786,432 bits of information could be hidden. If we allow 8 bits per character of information content, then the picture could hold 786,432 / 8 = 98,304 characters of information. If we use the low order bit of each color of each pixel, the total number of pixels available becomes 1024 x 768 x 3 = 2,359,296 bits or 2,359,296 / 8 = 249,912 characters of hidden text, graphics, or sound. Yes, the color of the picture has been slightly modified, but at a level undetectable by the human eye. It would not be unreasonable to use the low order two or three bits of each RGB value in most graphics to store information without degradation detectable by the user.

 

It is important to note that pictures in excess of 1024 x768 have become common. These resolutions allow for storing smaller pictures (the message) within the carrier (container) picture. The same technique can be used to store information in sound and other computer storage formats employing an encoding scheme allowing for variations (shades) in the data. When storing images, the image is frequently approximated using a gray scale (for drawings) or 256 color palette for paintings. A 256 color palette only requires 8 bits per pixel rather than 24 bits giving a 1:3 compression ratio. The compressed 256 color versions are frequently sufficiently close to the original to satisfy the customer. As an example, most paintings by famous artist were originally distributed in electronic format using 256 color palettes via the web and in multimedia applications. Not only did this format save space and transmission time but the consumer either could not tell the difference between the copy and original or did not care about the slight differences. Pictures, video, and sound make excellent containers in which to hide other information including smaller pictures, video, and text.

 

Note that a graphic, video, or sound file might be compressed prior to hiding. The compression offers a type of encryption. For even better protection, you might first encrypt compress the then encrypt the object to be hidden. This saves space and provides, potentially makes the message harder to detect, and helps to ensure privacy if detected.

 

9) Compression is frequently used to reduce the size of a file or message for transmission or long term storage. There are basically two kinds of compression: loss less and lossy. Loss less implies the original can be reproduced (reconstructed) exactly (GIF, BMP). Lossy relies on the fact the user will not be able to tell the picture or sound is different if the changes are small. Lossy compression is detrimental as the lost bits are frequently the bits used to store the hidden message (low order bits). GIF and BMP image compression techniques frequently allow for compression while assuring complete integrity. JPEG and many other image storage formats are normally associated with loss of integrity during compression.

 


A good comparison of steganography software may be found at http://www.jjtc.com/Security/stegtools.htm. Another excellent source of information may be found at http://www.petitcolas.net/fabien/steganography/. A particularly easy tool to install and use on Windows systems for GIF and BMP graphics is "S-Tools 4.0" at ftp://ftp.funet.fi/pub/crypt/mirrors/idea.sec.dsi.unimi.it/code/ (Finland) or ftp://idea.sec.dsi.unimi.it/pub/security/crypt/code/ (Italy). It also does an excellent job of making the hidden material undetectable. S-Tool allows the user to drag and drop an image to be hidden on top of an existing image (the container image must be substantially larger) after selecting a password and encryption algorithm. Consider an example using the following pictures.

 

YodaSmall.BMP (960 x 600) image006.GIF (213 x 200)

 

In the following example, S-Tools was started first followed by Windows Explorer. The file YodaSmall was then selected in Explorer and dropped into an empty spot on the S-Tools window

 

 

You hide a graphic by first selecting it in Explorer then drag and drop it on top of a graphic already on the S-Tools window. In this case I dropped image006.GIF on top of YodaSmall.BMP. When the graphic is dropped, S-Tools responds by prompting for the pass phrase and selection of an encryption algorithm.

 

 

The graphic on the right contains the image006. Note there is no perceptible difference even when comparing the images side by side.

 

 

To retrieve a hidden graphic first right click the image suspected of containing the hidden image. S-Tools then prompts for the pass phrase and encryption algorithm.

 

 

If a hidden graphic is found, S-Tools then pops up a window showing the original file name and size. Right click the file name and select the "Save as" option to complete the retrieval process and save the results in a file.

 

 

 

Computer monitors represent graphics (pictures) as a series of dots refereed to as pixels. A SVGA screen is typically at least 1024 by 768 pixels. Each pixel is a mixture of three primary colors: red, green, and blue (RGB value).

 

 

Each color is represented as an 8 bit binary number. Hence there are 28 equal 256 intensities for each of three primary component colors. Each RGB component's intensity is represented by an integer in the range 0 through 255 (none to maximum intensity). RGB values (red, green, blue) are typically stored serially as three consecutive 8 bit binary numbers (frequently referred to as bytes) in graphic files. An 8 bit binary opacity value may be stored with each pixel for a total of 32 bits. The three shades of primary color are seen by the eye as a single blended color. The total number of available colors is 256 * 256 * 256 = 16,777,216 per pixel. The human eye cannot really differentiate between that many shades. The general format of a graphics file is:

 

File Name
pixel
pixel
pixel
Steg1
R
G
B
R
G
B
...
R
G
B
Header byte byte byte byte byte byte ... byte byte byte eof

 

For convenience let us assume a file where each pixel is represented by an RGB value (red, green, and blue). The size of a typical graphics file is the size of the picture in pixels times at least 24 bits per pixel. Hence a 1024 by 768 pixel graphic would occupy a file of at least (1024 * 768)pixels * 3 bytes/pixel = 239,616 bytes or 239,616 bytes * 8 bits/byte = 1,916,928 bits. Each graphic is stored on the disk with a header giving general information about the file. Typical header information includes the file type (GIF, BMP, PNG, JPG), creation date, size, and related information. The header is followed by the pixel codes.

 

The pixels are highly susceptible to manipulation without noticeable change in a picture content or quality. Modify the low order bits of each RGB value in the following graphic by increasing or decreasing the value of the current setting by a value in the range 1 to 6. There is a good chance you eye will not be able to discern the change. Now imagine modifying a single pixel where not all pixels are the same color. It is very difficult to tell the picture has been modified. The bits you have modified can be used to hide a second graphic within the first graphic. The more bits you modify per pixel the greater the amount of information that can be hidden with the carrier graphic. The more information you hide per fixed space, the greater the chance for detection.

 

compare to

 

 

To demonstrate the difficulty of detection, please note a 4 pixel square has been printed to the left of each graphic in an opposing color. A one pixel graphic is located on a horizontal line and approximately equal distance within the solid color.

 

Hiding Strategies:

1) Cipher selection: null or encrypted

 

2) For RGB images:

a) Use only the low order bit of one RGB value.

 

b) Use two or all three low order bits of each RGB value.

 

c) Encode the second graphic in the first but do not user every pixel.

 

d) Insert the graphic by selecting pixels according to a mathematical formula or pseudo random number generator.

3) Selection Rule: There are no rules. Take advantage of the target.

 

To get a better idea of your eye's ability to differentiate colors consider the following two images of the same graphic.

 

 

Both graphics were created from the same image scanned at 16 million colors. The image on the left was saved using a pallet 256 colors. The image on the right was saved using a pallet of 32,766 colors. If you are having trouble seeing the difference, focus on the petal in the lower right hand corner of each graphic.

 

A general purpose tool to place a Steganography watermark in "jpeg" graphics as they are served over the web is available at http://www.proxymark.org. The software was written by David Collins at the "Center for Excellence in Digital Forensics" located at Sam Houston State University Huntsville, Texas. The software runs in conjunction with your proxy server. It intercepts all jpeg (jpg) graphics as they are acquired by the web server for transmission and inserts a digital watermark. OutGuess by Niels Provos at http://www.outguess.org may be used to steg individual "jpeg" graphics.

 


 

 

Digital sound or light files can also be used to hide information. Typically sound is analog, a continuous wave form. Information is stored in the wave form using frequency or amplitude modulation. To digitize the sound, it is sampled as shown and a number assigned to represent the Magnitude (aptitude) of the wave. Each sampling point represents one digital datum. The more frequently the wave form is sampled, the more accurate the digital representation. The quality of the digital representation may also be improved by increasing the range of the number used to measure the amplitude. The trade off is in increased storage and inability of the observer to distinguish the difference in quality between the current sampling rate and a lower sampling rate. The higher the sampling rate, the greater the amount of Steganography material that can be stored in a fixed period of time. The following wave form could be sound or light. If light, the wave form may be in the visible spectrum or invisible spectrum, e.g., infrared. Sound may also be within or outside the range detected by the human ear.

 

The strategy for hiding material in a digital representations of light or sound is the same as that for pixels in the computer graphics representation. Low order bits representing the wave form may be manipulated without discernable impact to the person observing the medium.

 

The examples all assume a digital format. Analog carrier waves such as electricity, sound, and light may be used to hide materials using analog techniques as well. Remember the primary rule is "There are no rules.!"

 


 

 

After World War II, steganography techniques received little attention compared to cryptography. Two circumstances have arisen to increase interest in steganography.

 

1) First, governments have attempted to restrict the availability of encryption for secure communications. Generally governments desire for individuals to be able to protect themselves from other individuals but not from the government itself.

 

2) Secondly, interest has renewed due to the desire to protect copyright in audio, video, books, software, and other works available in digital format. Copyright information would first be inserted (hidden) in the digital format prior to publication in a public forum. Illicit copies of originals in digital format can frequently be made with ease. Illegal copies of the digital media could be identified as they would contain the hidden copyright which could be extracted by authorities. A down side is once the method for inserting the message becomes know, it could be removed from other illicit copies making them it impossible to prove they were obtained illegally. Techniques to mark digital information to protect copyright are frequently referred to as digital copyright or digital watermarks.

 

One of the first major academic conferences on Steganography topics was the International Workshop on Information Hiding, held in Cambridge, UK, in May/June 1996 (http://www.springeronline.com/sgw/cda/frontpage/0,11855,5-0-22-1498671-0,00.html?referer=www.springer.de%2Fcgi-bin%2Fsearch_book.pl%3Fisbn%3D3-540-61996-8) organized within the research program in computer security, cryptology and coding theory organized by the volume editor at the Isaac Newton Institute in Cambridge. The Fifth International Workshop on Information Hiding held in Noordwijkerhout, October 2002 (http://research.microsoft.com/ih2002/) contains material of interest to both research and industry. Additional conferences and related information may be located using Google with the phrase "International Workshop on Information Hiding" or "steganography."

 

Motivations:

1)         Steganography will continue to be a topic of interest both to provide digital watermarks and to communicate information in hostile environments.

 

2)         Steganography may be especially useful to individuals attempting to hide material from their employers and governments as it does not attract attention. Even when the employer or government is suspicious, they may not be able to detect what is happening right under their noses.


Protecting Copyright Example:

Steganography can be used as a tool to provide digital copyright. Assume we wish to be able to detect music CD's made illegally. We imbed a steganography image or text string in the music prior to burning the master CD. All CD's made from the master will contain the hidden copyright in addition to the visible copyright. Copies of CD's made from the master will also contain the copyright information. If we suspect a CD has been illegally copied, check to see if it contains the hidden copyright.

Note this may not work more than once. After users are made aware of the protection method, they make copies where they first destroy the steganography information. This can frequently be done without impairing the quality of the music. The technique can be extended to graphics, video, and other mediums.


Creating Steganography Opportunities:

As described in DigitalImages.html, graphic files may be represented by a pallet customized for each picture followed by a vector with one entry per pixel indexing the pallet. Assume a pallet with 32,766 colors. The index into into the pallet requires 16 bits per pixel, .i.e., 2**16 = 65,536 with values from 0 thru 32,765. Now reduce the number of colors utilized to represent the picture from 32,766 to 256. Take another look at the images in DigitalImages.html. This can frequently be done with little or no degradation to the human eye. The pallet for 256 colors however only requires 8 bits per pixel to index the pallet, i.e., 2**8 = 256 with values from 0 thru 255. Rather than saving space by utilizing 8 bits per pixel, continue to store the graphic as 16 bits per pixel. Both the extra pallet entries and the extra bits per pixel are now available to store information. Essentially more than half the space in the pallet and half the space in each pixel is now available to store steganography information. Another way to look at it is half the space utilized to store a 1024 by 768 graphic is now actually available for hidden information.

This same approach may be used with sound or almost any other medium that has been digitized. Specifically there are two approaches.

1) Using a graphic for illustration, the first approach is to not increase the resolution just the number of bits used to represent the medium. There is no loss of quality and the extra bits are all available to store additional information. The typical example would be to go from 16 bit color to 24 bit color. No matter how many bits you add to the resolution representation, you cannot increase the actual resolution of the source.

Example: You are viewing a picture with a graphics editor which states the picture contains 32,766 or less colors using 24 bits per pixel. It only requires 16 bits per pixel to represent the picture, i.e., 2**16 = 32,766. You have got to wonder why there are an extra 8 bits per pixel!

2) The second approach is to reduce the number of bits to represent the medium, e.g., from 32,766 colors to 256 colors (16 bit color to 8 bit color). There is a loss of quality but the user may not be able to perceive the loss. Again, continue to store the medium at the higher resolution. The additional bits (50% in the example) can be used to store hidden information. Changing the sampling rate for music can greatly increase the available storage to hide information.

 

In any case, the number of recording standards for pictures, video, and sound is increasing. Even within a specific recording standard such as "gif" and "jpg" multiple formats exist. Each recording standard and variation in formats supported by the standard provide additional opportunities to employ steganography.


References:

  • Digital Forensics, Sam Houston State University http://www.df.shsu.edu
  • Steganography Tools http://www.jjtc.com/Security/stegtools.htm
  • The Information Hiding Homepage Digital Watermarking & Steganography http://www.petitcolas.net/fabien/steganography/
  • Steganography Analysis and Research Center http://www.sarc-wv.com/
  • Neil F. Johnson, Zoran Duric, and Sushil Jajodia, "Information Hiding: Steganography and Watermarking - Attacks and Countermeasure." 
  • StegoArchive.Com at http://www.stegoarchive.com/
  • Proxymark by David Collins at http://www.proxymark.org
  • OutGuess by Niels Provos at http://www.outguess.org

 

 

 

4317