JPEG stands for Joint Photographic Experts Group, which is the name of the committee set up by the International Standards Committee that originally wrote the image format standard. The JPEG committee has the responsibility of determining the future of the JPEG format, but the actual JPEG software that makes up the toolkit used in most web applications is maintained by the Independent JPEG Group (http://www.ijg.org).
The JPEG standard actually only defines an encoding scheme for data streams and not a specific file format. JPEG encoding is used in many different file formats (TIFF v.6.0 and Macintosh PICT are two prominent examples), but the format used on the Web is called JFIF, an acronym that stands for JPEG File Interchange Format, which was developed by C-Cube Microsystems (http://www.c-cube.com) and placed in the public domain. JFIF became the de factostandard for web JPEGs because of its simplicity. When people talk about a JPEG web graphic, they are actually referring to a JPEG-encoded data stream stored in the JFIF file format. In this book we will refer to JFIFs as JPEGs to reduce confusion, or to further propagate it, depending on your point of view.
To create a JPEG you must start with a high-quality image sampled with a large bit depth, from 16 to 24 bits, for the best results. You should generally only use JPEG encoding on scanned photographs or continuous-tone images (see the section Section 1.1.6 earlier in this chapter).
JPEG encoding takes advantage of the fuzzy way the human eye interprets light and colors in images by throwing out certain information that is not perceived by the viewer. This process creates a much smaller image that is perceptually faithful to the original. The degree of information loss may be adjusted so that the size of an encoded file may be altered at the expense of image quality. The quality of the resulting image is expressed in terms of a Q factor,which may be set when the image is encoded. Most applications use an arbitrary scale of 1 to 100, where the lower numbers indicate small, lower-quality files and the higher numbers indicate larger, higher-quality files. Note that a Q value of 100 does not mean that the encoding is completely lossless (although you really won’t lose much). Also, the 1 to 100 scale is by no means standardized (the Gimp uses a to 1.0 scale), but this is the scale used by the IJG software, so it is what we will use here. There are a few guidelines for choosing an optimal Q factor:
The default value of 75 is appropriate for most purposes; a value as low as 50 is acceptable for web applications. This is a good starting point anyway, yielding a compression ratio of 10:1 or 20:1.
The Q factor should never be set above 95. Values higher than 95 cannot be distinguished from 95. In practice, you may find images with Q factors of 75 or above nearly indistinguishable from each other in quality.
Very-low-resolution thumbnails can have a Q value as low as 5 or 10.
Most Progressive JPEG encoders expect a Q value in the range of 50-75. Values outside this range will not get the most out of the Progressive JPEG scheme.
A rule of thumb for estimating the effectiveness of JPEG compression is that it will save an image at 1 to 2 bits per pixel. It does this by running the data through an elaborate decoding process.
To encode a 24-bit image as a JPEG, the image goes through a four-step assembly line. Decoding the image essentially reverses the process, though rounding errors and certain assumptions made in the encoding process make JPEG a lossy form of compression. Lossy compression is when the decompression process cannot reproduce the original data exactly bit for bit. The four steps of the encoding process are:
Apply color space transform and downsample
Apply Discrete Cosine Transform (DCT) to blocks of data
Apply Huffman encoding
The first step in the encoding process accounts for about 50% of the space savings of JPEG encoding (for color images, anyway; grayscale images pass immediately to step 2 and are thus inherently less susceptible to JPEG compression). Taking advantage of the fact that the human eye responds more to changes in levels of brightness than to changes in particular colors in adjacent pixels, this step first changes the color space of the image from RGB to (usually) YCbCr. The YCbCr color space represents an image as three components of brightness or luminance (Y) and chrominance (Cb for blue chrominance and Cr for red chrominance). Of these components, luminance is the most important, so the two chrominance components are downsampled to reduce the amount of information we need to store. Downsampling means that only one chrominance value pair is stored for each 2 × 2 block of pixels, rather than four pairs. This is where the 50% savings comes into play:
A 2 × 2 block of R, G, B = 4 + 4 + 4 = 12 bytes
A 2 × 2 block of downsampled Y, Cb, Cr = 4 + 1 + 1 = 6 bytes
From this point on, each component may be thought of as an individual channel that is encoded separately.
The second bit of physiological trivia taken into account by JPEG is that the eye is more sensitive to gradual changes in brightness than to sudden changes. Because of this, we can achieve another factor of compression with minimal change in the perceived image by throwing out information about the higher frequencies in the image. To separate low- and high-frequency information, we apply a Discrete Cosine Transform (DCT) to 8 × 8 blocks of the image data. The DCT is a big intimidating formula (it’s actually not that difficult to understand if you have some higher-level math skills), which we won’t print here because it would scare off potential readers just thumbing through the book looking for an access counter script. Seriously though, all you need to know about the DCT is that it gets the data into a form that makes it easy for the next step in the process to discard unnecessary information, and that it is the most time-intensive part of the encoding/decoding process.
At this point, it is not really accurate to think of the image as being stored as discrete pixels any more than you would think of a real photograph as being comprised of pixels. It is more accurate to think of the image as a table of values that refer to an abstract mathematical model that describes the image.
In the third step, the DCT value for each 8 × 8 block is divided by a quantization coefficient, which is stored in a table along with the image to be used in the decoding process. This quantization table is generally taken from an existing table (the JPEG specification defines a sample table) which is modified by the Q value (described earlier) that determines the quality of the resulting image. Files saved with a higher Q rating will have their DCT coefficients divided by smaller numbers, which will enable the image to be decoded more accurately, but will result in a larger file size. Files with lower Q values will have coefficients divided by larger numbers and will be smaller, but the decoding process will be less exact. This step is where most of the information loss occurs.
A working JPEG decoder is more complicated than a GIF or a PNG decoder and would take up too much space in this chapter. There is, in fact a whole book on the topic (see Section 1.5 below). The IJG’s free JPEG library described in Chapter 3, includes several utilities written in C for decoding JPEG files.