20.5 The Entropy of English

In an English text, how much information is obtained per letter? If we had a random sequence of letters, each appearing with probability 1/26, then the entropy would be log2(26)=4.70; so each letter would contain 4.7 bits of information. If we include spaces, we get log2(27)=4.75. But the letters are not equally likely: a has frequency .082, b has frequency .015, etc. (see Section 2.3). Therefore, we consider

(.082log2.082+.015log2.015+)=4.18.

However, this doesn’t tell the whole story. Suppose we have the sequence of letters we are studyin. There is very little uncertainty as to what the last letter is; it is easy to guess that it is g. Similarly, if we see the letter q, it is extremely likely that the next letter ...

Get Introduction to Cryptography with Coding Theory, 3rd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.