Because graphics files are stored as binary data and are unreadable by humans (actually, parts of graphics files are readable by humans, if you know what you’re looking for), most people are intimidated into not looking under the hood at the internals of image file formats. Of course, it is a Good Thing that as a web author you can think of an image as a “black box” that somehow understands its own image-ness. But image file formats are not necessarily inscrutable objects if you really want to know how they work, and understanding the structure of the files that you work with on a daily basis can help you remember the vagaries of image manipulation. Knowing how a GIF file is formatted, for example, will help you answer these questions:[2]
Why isn’t a GIF with 129 colors smaller than one with 256 colors?
Can a multi-image GIF have more than one transparent color?
What is the maximum color depth of a GIF?
How does a decoder program know that a file is a GIF?
How can I make the smallest possible multi-image file?
Hopefully this chapter will help demystify image file formats and help you feel more at home with the binary black boxes called GIFs, PNGs, and JPEGs.
Creating graphics file formats for distribution over variable speed communications networks (such as the Internet) poses a number of problems. Each end user’s computer may be connected at speeds as slow as 14.4 bits per second or as fast as several megabits per second, and you would like them all to be able to download and display graphics at some sort of reasonable speed. The Internet started as a place where the common coin was text. ASCII text is easy; one byte[3] per character keeps the average missive to a size where near-instantaneous communication is possible. Graphics, however, are much more information-intensive. The proverbial picture worth a thousand words can actually translate into hundreds of thousands of words when it comes to sending that picture over the Internet. To deal with network graphics, people have developed a toolkit of structuring conventions and compression tricks that make possible the graphics-intensive Web that we know and love. This section will provide an overview of this vocabulary and point out how GIF, PNG, and JPEG (what we will call the web graphics formats) actually implement these concepts.
No, this section is not about hunting and fishing. Web graphics formats can be thought of as data streams broken up into fields (so much for the outdoor activity metaphor). Everything that is transferred over the Web may be thought of as a data stream—a series of data packets received one at a time and assembled into a sequential data structure. This data structure is in turn divided into fields. The GIF and JPEG formats call these fields blocks, and PNG calls them chunks. Fields are a fixed, predictable data structure stored within an image file whose layout is defined by the file format specification. Typically a field will contain information about an image’s dimensions, how the colors are defined within the image, special information needed by a display device to properly render the image, etc. These fields of information are often structured so that it is easy for a program displaying the image to quickly extract all the information it needs.
Image transmission is always a tradeoff between two limiting factors: the time it takes to transfer the image over the network versus the time it takes to decode the image. JPEG, for example, is a highly compressed format that allows for small files and quick transmission times but requires longer to decode and display. The format works very well because generally the network is the bottleneck, with the average desktop computer perfectly able to perform the necessary decoding operations in a reasonable amount of time. Of course, the ideal is a very small file that is very easily decoded. In practice, it is always a tradeoff.
An image with a color depth of 24 bits per pixel (or more) is known as a truecolor image. Each pixel in the image is saved as a group of three bytes, one for each of the red, green, and blue elements of the pixel. Each of the RGB elements can be represented as one of 256 (28) values, which gives us 2563 or 16,777,216 possible colors: 8 red bits + 8 green bits + 8 blue bits = 24 bits. This also means that a 200 × 200 pixel truecolor image saved in an uncompressed format would take up 120K for the image data alone, and a 500 × 500 pixel image would take up 750K. Both of these images would be too large to put on a web page, which is a reason that some image formats store color information in color tables that make for a far smaller image file size (see Section 1.1.4 later in this chapter).
The PNG format allows you to save color images with a depth of up to 48 bits per pixel, or grayscale images at 16 bits per pixel. This is actually beyond the display capacity of most consumer video hardware available today, where 24-bit color is the standard. JPEG will also let you store images with a color depth of up to 36 bits.
An image with a color depth of 8 bits is sometimes called a pseudocoloror indexed color image. Pseudocolor allows at most 256 colors through the use of a palette, which is sometimes also referred to as a color index or aColor Lookup Table(CLUT). Rather than storing a red, green, and blue value for each pixel in the image, an index to an element in the color table (usually an 8-bit index) is stored for each pixel. The color table is usually stored with the image, though many applications should also provide default color tables for images without stored palettes.
To save a “real world” image (i.e., something with more than 256 colors) as a pseudocolor image, you must first quantize it to 256 colors. Quantization alone will usually give you an image that is unacceptably different than the source image, especially in images with many colors or subtle gradients or shading. To improve the quality of the final image, the quantization process is usually coupled with a dithering process that tries to approximate the colors in the original by combining the colors in various pixel patterns. Figure 1.1 shows an original image, the same image quantized to an “optimal” 256 colors (the 256 colors that occur most frequently in the image), and the image quantized and dithered with the Floyd-Steinberg dithering process.
Figure 1-1. A 24-bit image (top) must be quantized to 256 or fewer colors to save it as an 8-bit indexed image (left). Usually, dithering is applied (right) to improve the image quality.
Creating indexed color images for use on the Web has a number of pitfalls, which are discussed in Chapter 2.
The GIF file format is an indexed color file format, and a PNG file can optionally be saved as an indexed color image. A GIF file will always have at most 256 colors in its palette, though multiple palettes may be stored within a multi-image file, so the 256-color limit is only applicable to one image of a multi-image sequence. A PNG may also have a 256-color palette. Even if a PNG image is saved as a 24-bit truecolor image, it may contain a palette for use by applications on platforms without truecolor capability.
Transparency in web graphics allows background colors or background images to show through certain pixels of the image. Generally, transparency is used to create images with irregularly shaped borders (i.e., non-square images). The three primary file formats have varying degrees of support for transparency.
Right off the bat, we should note that transparency is not currently supported in JPEG files, and it will most likely not be supported in the future, because of the particulars of the JPEG compression algorithms and the niche at which JPEG is aimed.
The GIF file format creates transparency by allowing you to mark one index in a color table as the transparent color. The display client should use this transparency index when displaying the image; pixels with the same index as the transparency index should simply be “left out” when the image is drawn. Each image in a multi-image sequence can have its own transparent index.
A single transparency index takes up one byte in the GIF file, as part of a Graphics Control Extension Block, described in the GIF section (later in this chapter). The PNG format allows for better transparency support by allowing more space for describing the transparent characteristics of the image (see Figure 1.2), though the full range of its capabilities are not necessarily supported by all web clients. PNG images that contain grayscale or color data that has been sampled at a rate between 8 and 16 bits per sample may also contain an alpha channel(also called an alpha mask), which is an additional 8 to 16 bits (depending on the image color depth) that represents the transparency level of that sample. An alpha level of indicates complete transparency (i.e., the pixel should not be displayed) and an alpha value of 2n- 1 (where nis the color depth) indicates that the pixel should be completely opaque. The values in between indicate a relative level of semi-transparency. Again, the actual implementation of the display of these semi-transparent pixels is left to the display client, and robust web browser support for the full range of possibilities presented by a full alpha channel has been spotty in the past. Consult your favorite browser’s documentation for the details of its alpha support.
Figure 1-2. A GIF with a single transparent color (left) versus a PNG with a full alpha channel (right)
In a PNG with a color table, alpha values can be assigned to one or more entries in the table through the use of a tRNS chunk, described in Section 1.3 of this chapter.
For the purposes of high-level web graphics programming, we only need to understand a couple of high-level concepts about data compression. One is the distinction between “lossy” and “lossless” compression formats, and the other has to do with intellectual rights and freedoms.
People generally interpret the term “lossy” compression to mean that there is information lost in the translation from source image to compressed image, and that this information loss results in a degraded image. This is true to a point. However, you could also argue that there is information lost in the process of creating a GIF (a so-called “lossless” storage format) from a 24-bit source image, since the number of colors in the image must first be reduced from millions to 256. A more accurate definition of lossy would be something like “a compression algorithm that loses information about the source image during the compression process, and repeated inflation and compression will result in further degradation of the image.”
JPEG is an example of a lossy compression format. PNG and GIF are both examples of lossless compression. A lossless compression algorithm is one that does not discard information about the source image during the compression process. Inflation of the compressed data will exactly restore the source image data.
The distinction between these two methods of compressing image data actually affects the way you do your everyday work. Suppose, for example, you have created a number of images for a web site, which are meant to be served as JPEGs (a lossy format) because they contain nice gradients that would look less-than-spiffy as GIFs (and you haven’t explored the possibility of PNG yet). Suppose you created all these images in Photoshop (or even better, the Gimp) and saved them as JPEGs, but neglected to save the original source files. Then suppose that your client wanted the images cropped slightly differently and you had to reopen those JPEGs, edit the images, and re-save them. This would be a very painful way to learn the meaning of the term lossy compression, because the resulting images would most likely be less than presentable, as you can see in Figure 1.3.
Figure 1-3. Once an image is saved as a JPEG (left), repeated decoding and encoding can result in information loss and poor image quality (right)
Generally a JPEG can be decoded and re-encoded and, as long as the quality setting is the same, you will not get visible degradation of the image.[4] If you change any part of the image, however, the changed part of the image will lose even more information when it is re-encoded. If an image is cropped or scaled to a different size, the entire image will lose more information.
Tip
A quick aside about saving images: Proper procedure is to keep a copy of the original artwork in a format that retains all of the image’s information. If you are using an image manipulation program like Photoshop or the Gimp, you will probably want to save it in the program’s native format (PSD and XCF, respectively) to preserve information about layers and channels and whatnot. Or save it as a PNG. PNG is in general a superior file format for multiple-purpose images, as we will see later in this chapter.
The other important bit of information to know about image compression is more of a legal issue than a technical one. “Do I have to pay licensing fees?” is one of the more frequently asked questions about the GIF file format. In a nutshell, GIF is not free and unencumbered software because CompuServe, the creators of GIF, used the LZW (Lempel-Ziv-Welch) codec (compression-decompression algorithm) to implement its data compression. The patent for the LZW algorithm[5] is owned by the Unisys Corporation, which requires a licensing fee for any software that uses the LZW codec. The GIF file format does not allow the storage of uncompressed data, or data compressed by different algorithms, so if you use GIF, you must use LZW. There is some confusion as to exactly what uses are covered by the patent, but Unisys has taken the matter to court a number of times. The LZW patent has been called (by Unisys) “one of the most broadly licensed patents in history.” I am not a lawyer so I will not attempt to offer an opinion on the matter. However, there has been much discussion of this topic on Usenet by people with a lot of opinions.
Because of this patent and the tendency of the software world toward open standards and open source, an “anti-GIF” movement has been prevalent on the Web for quite some time, though it has been thus far unsuccessful[6] because of the momentum gathered by GIF early on, and the relatively slow adoption of browser support for other graphics formats.
The PNG file format was developed as an alternative to GIF. The compression algorithm used by PNG is actually a version of the Deflate algorithm used by the pkzip utility. Deflate is, in turn, a subset of the LZ77 class of compression algorithms (yes, that’s the same L and Z as in LZW compression). PNG’s compression method does not use any algorithms with legal restrictions, however. This is one of its major selling points.
The JPEG file format uses JPEG compression. That may seem like a tautologous statement, but it’s not. We’ll get to the semantics of JPEGs in a little bit.
All three of the standard web graphics formats support a means for providing the progressive display of an image as it is downloaded over the network. Why further complicate an image file to support progressive display? It offers a big perceptual gain in download speed. The idea is that partial information about an entire image may be shown and the display may be refined as the image downloads, rather than displaying a refined image one row at a time. Progressive display is achieved by interlacing, a technique in which pixels are saved in a nonconsecutive order and then drawn in the order they are received from the stream. The result is an image drawn as a grid of pixels that is progressively filled in with more information. Interlacing is implemented differently by different file formats.
It should be noted that interlaced files tend to be slightly larger than non-interlaced files (except for Progressive JPEGs, which tend to be slightly smaller). This is because most compression schemes make certain assumptions about the relationships of adjacent pixels in an image, and the interlacing process can disrupt this “natural” ordering of pixels that work well with compression algorithms. Interlacing can more than make up the slight difference in file size with a perceptual download speed up, however.
The image data for a GIF file is stored by the row (or scanline), with one byte representing each pixel. A non-interlaced GIF will simply store each scanline consecutively in the image data field of the GIF file. An interlaced GIF will still group pixels into scanlines, but the scanlines will be stored in a different order. When the GIF file is encoded, the rows will be read and saved in four passes; the even-numbered rows (using a 0-based counting system) will be saved in the first four passes, and the odd-numbered rows will be saved in the final pass. The interlacing algorithm is, in words:
Pass One: Save Row 0, then save every 8th row thereafter (0, 8, 16...). |
Pass Two: Save Row 4, then save every 8th row thereafter (4, 12, 20...). |
Pass Three: Save Row 2, then save every 4th row thereafter (2, 6, 10, 14, 18...). |
Pass Four: Save every odd row. |
Graphically, with each pixel coordinate labeled with the pass on which it is saved and rendered, GIF interlacing would look like this:
1 1 1 1 1 1 1 1 4 4 4 4 4 4 4 4 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 2 2 2 2 2 2 2 2 4 4 4 4 4 4 4 4 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4
When the image is later reconstituted, the display client (the web browser) will usually temporarily fill in the intervening rows of pixels with the values of the nearest previously decoded rows, as you can see by looking at the progressive stages in an interlaced GIF display shown in Figure 1.4. The interlacing approach taken by the GIF format displays after one pass a version with one-eighth vertical resolution of the entire image, one-quarter resolution after two passes, one-half resolution after three, and the complete image after the fourth. In many cases the user can interpret the image after only the first or second pass.
PNG uses a slightly different interlacing scheme than GIF. While GIF completes the interlacing in four passes, where the first three passes count even scan lines, PNG uses a seven-pass scheme called Adam7 (named after its creator, Adam M. Costello), where the first six passes contribute to the even rows of pixels, and the seventh fills in the odd rows. Because PNG files do not necessarily have to store pixels in a contiguous scanline, each pass contains only certain pixels from certain scanlines. In words, the interlacing algorithm (which reads like a Fluxus poem or a word problem from a Mensa test in spatial visualization) is:
Pass One: Save every 8th pixel (starting with Pixel 0) on every 8th row (starting with Row 0). |
Pass Two: Save every 8th pixel (starting with Pixel 4) on every 8th row (starting with Row 0). |
Pass Three: Save every 4th pixel (starting with Pixel 0) on every 8th row (starting with Row 4). |
Pass Four: Save every 4th pixel (starting with Pixel 2) on every 4th row (starting with Row 0). |
Pass Five: Save every even pixel on every 4th row (starting with Row 2). |
Pass Six: Save every odd pixel on every even row. |
Pass Seven: Save every pixel on every odd row. |
Graphically, this looks like the grid below, where each pixel in an 8 × 8 block is labeled with the pass on which it appears on the screen:
1 6 4 6 2 6 4 6 7 7 7 7 7 7 7 7 5 6 5 6 5 6 5 6 7 7 7 7 7 7 7 7 3 6 4 6 3 6 4 6 7 7 7 7 7 7 7 7 5 6 5 6 5 6 5 6 7 7 7 7 7 7 7 7
This scheme leads to a perceptual speed increase over the scanline interlacing used by GIF. After the first pass, only 1/64th of the image has been downloaded, but the entire image can be drawn with 8 × 8 pixel resolution blocks. After the second pass, 1/32nd of the file has been transferred, and the image can be drawn at a 4 × 8 pixel block resolution. Small text in an image is readable after PNG’s 5th pass (25% of the file downloaded) which compares favorably with GIF’s interlacing gains, where small text is typically readable after the 3rd pass (50% of the file downloaded).
JPEG files may also be formatted for progressive display support. Progressive JPEG is considered an extension to the JPEG standard and progressive display is not fully implemented by all web clients.
The scanline interlacing techniques used by GIF and PNG are not applicable to JPEG files because JPEGs are a more abstract way of storing an image than a simple stream of pixels. (It is more accurate to call a JPEG file a collection of Discrete Cosine Transform coefficients that describe a pixel stream, but more on that later.) Essentially, a Progressive JPEG that is displayed as it is transferred over the network would first show the entire image as if it had been saved at a very low quality setting. On successive passes the image would gradually resolve into the complete image, at the quality level at which it was saved.
Progressive JPEGs are not yet the most efficient means of progressive display, as the entire image must be decoded with each subsequent pass; however, the JPEG format offers such substantial compression that progressive display is not as important as for other file formats.
For quite a while, the only file formats that could be used on the Web for general-purpose images were GIFs and JPEGs. Each format has applications at which it excels and applications at which it, in the popular parlance, sucks. The adoption of PNG as a standard format has made the question of what to use when a bit fuzzier. In general, PNG is intended as a replacement for GIF, but there are some applications for which PNG can be used effectively instead of JPEG, and there are still applications where GIF must be used. To start with, we should summarize some of the details discussed in the previous section. Table 1.1 provides an overview of the formats.
Table 1-1. File Format Comparison
Category |
GIF |
PNG |
JPEG |
---|---|---|---|
Truecolor support |
No |
Yes |
Yes |
Color table support |
Yes |
Yes |
No |
Maximum size of color table |
256 |
256 |
— |
Maximum color depth |
8-bit indexed |
8-bit indexed 16-bit grayscale 48-bit RGB +16 bits w/alpha |
12-bit grayscale 36-bit “RGB” 32-bit CMYK |
Transparency support |
Yes |
Yes |
No |
Alpha channel |
No |
Yes |
No |
Max alpha channel depth |
— |
16 bit |
— |
Maximum image size (pixels) |
64K × 64K |
2Gig × 2Gig |
64K × 64K |
Multiple images per file |
Yes |
No |
No |
Byte ordering |
Little-endian |
Big-endian |
Big-endian |
Compression |
LZW |
Deflate |
JPEG |
Compression ratio |
3:1 to 5:1 |
4:1 to 10:1 |
5:1 to 100:1 |
Compression method |
Lossless |
Lossless |
Lossy |
Progressive display |
Yes |
Yes |
Yes |
Interlacing style |
Scan line |
Adam7 |
PJPEG |
What follows is a kind of “consumer reports” for web graphics formats, suggesting appropriate formats for various applications. All of the comparative images started from the same RGB source image. They were all created with the Gimp (GNU Image Manipulation Program, which is described in Chapter 7) and saved with the default settings for each format. The default quality rating for JPEGs is .75 and the default compression setting for PNG is 9.0.
Only the GIF and PNG formats support transparency, which is required to create irregularly shaped images. If you use PNG, be aware that full alpha support is not included in most browsers. Figure 1.5 shows a circular image saved in the three formats.
In general, photographs should be saved as JPEGs, which will allow a greater compression ratio and maximum image quality. Photographs in PNG are much larger. If the photograph is irregularly shaped or requires transparency, you may have to compromise with GIF, which will reduce the quality but still achieve a file size that is easily transferable (see Figure 1.6).
GIF or PNG should be used for images with text. If the image is a graphical menu and is larger than a few kilobytes, save it as an interlaced PNG, which will allow the user to read the menu after only a quarter of the file has downloaded. JPEG encoding does not deal with sharp edges very well and will introduce artifacts that will mar the appearance of the text (see Figure 1.7).
Grayscale images with 256 or fewer levels should be saved as GIFs or PNGs. Even grayscales with more than 256 levels do not necessarily need to be converted to JPEGs; for most applications, 256 levels of gray is sufficient, and the GIF or PNG will give you lossless compression that is not too different than JPEG’s grayscale compression (see Figure 1.8).
For line drawings with 256 colors or less, GIF or PNG should be used. For line drawings with more than 256 colors, PNG should be used. JPEGs are not designed to handle line drawings (see Figure 1.9).
Until MNG files (Multiple-Image Network Graphics, a multi-image-capable version of PNG) becomes widely supported, you’ll have to go with GIF89a for animated images, since neither PNG nor JPEG support multiple images within a file. Figure 1.10 shows the frames of a multi-image GIF89a file.
With that overview of the strengths and weaknesses of the different formats, the next three sections delve into the nitty-gritty of the actual file formats themselves. These sections will be especially instructive if you have never peeked into the “black box” before.
[2] Answers to these questions are given in the GIF section later in this chapter.
[3] Modern character sets are getting bulkier, however; the UC2 Unicode set is two bytes per character and Unicode UTG8 is 8 bytes per character.
[4] Actually, there is a form of lossless JPEG, but it not been implemented in the world of web clients.
[5] U.S. Patent No. 4,558,302
[6] “Unsuccessful” in the sense that virtually every web page has at least one GIF on it.
Get Programming Web Graphics with Perl and GNU Softwar now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.