56 PART II/COMPRESSION TECHNIQUES

For all of these codes, an integer N occurring with probability PN is encoded into or represented

by a codeword of L binary digits. A fundamental result of information theory is that for optimal

coding all codewords have a length L = log2(1 ~IN) bits, leading to the important general rule

that frequent values should have short codewords and less frequent values should have longer

codewords.

Efficiency: An efficient code, or compact code, is assumed to be one which, for each value,

approximates the minimum codeword length in comparison with other codes. In many codes

the codeword length is nearly a constant multiple of the binary representation. For these

codes the expansion relative to binary is a useful indication of efficiency.

Length indication: Every universal representation must somehow indicate its length or

signal the end of the bit stream for the codeword. Some representations are terminated by

a special bit pattern or comma. Others have an explicit length as part of the representation,

the length itself often using one of the terminated representations.

Transparency:

It is pleasant for human understanding, but certainly not essential, if the

value is easily extracted once a codeword is recognized. Some codes, for example, have a

portion of the codeword as the natural binary representation of the value.

Robustness:

The increasingly subtle coding techniques used to increase code efficiency

often reduce the redundancy of the code. But the sensitivity to transmission errors usually

increases with the decreasing redundancy. In extreme cases a single bit error can lead to

catastrophic error propagation and loss of codeword demarcation. Codes with a terminating

pattem or comma tend to be more robust and those with an encoded length less robust.

Instantaneousness:

This describes a code that is self-contained, in that a codeword can be

decoded without reference to adjoining codewords. A non-instantaneous code is fine in a

long sequence of codewords from the same code, but may be inconvenient if the compact

codes are interspersed with other, instantaneous, codes.

As a final introductory note, within this chapter the term bit will always mean binary digit

rather than an information measure. A codeword (the encoded collection of bits to represent the

value) is usually longer than the binary representation of the value (also in bits).

3.3 POLYNOMIAL REPRESENTATIONS

Many representations for an integer N combine a visible digit vector d with an implicit weight

vector w, such that N = d- w, the scalar product of the two vectors. A polynomial in some base

b results if successive terms of the weight vector are given by Wi+l ~ bwi. While an obvious

example of a polynomial representation is the familiar decimal representation, it is more instructive

to consider some other forms.

Binary:

(b = 2) The weight vector is powers of 2, least to the fight, or w = { .... 128, 64,

32, 16, 8, 4, 2, 1 } and the visible digits di ~ {0, 1 }. The (decimal) value 23 is represented

by 10111

23=lx16+0x8+lx4+lx2+l x 1.

Ternary:

(b = 3) The weight vector is now w = { .... 729, 243, 81, 27, 9, 3, 1} and

di ~ {0, 1, 2}. 20910 is represented by 21202

209=2 • 81+ 1 • •215 32+2 • 1.

Get *Lossless Compression Handbook* now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.