20.5 The Entropy of English

In an English text, how much information is obtained per letter? If we had a random sequence of letters, each appearing with probability 1/26, then the entropy would be ${log}_{2} (26) = 4.70$ ; so each letter would contain 4.7 bits of information. If we include spaces, we get ${log}_{2} (27) = 4.75$ . But the letters are not equally likely: $a$ has frequency .082, $b$ has frequency .015, etc. (see Section 2.3). Therefore, we consider

- (.082 {log}_{2} .082 + .015 {log}_{2} .015 + \dots) = 4.18.

However, this doesn’t tell the whole story. Suppose we have the sequence of letters we are studyin. There is very little uncertainty as to what the last letter is; it is easy to guess that it is $g$ . Similarly, if we see the letter $q$ , it is extremely likely that the next letter ...

Get Introduction to Cryptography with Coding Theory, 3rd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Introduction to Cryptography with Coding Theory, 3rd Edition by Wade Trappe, Lawrence C. Washington

20.5 The Entropy of English

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly