
Project: Shannon Entropy
In 1948, Claude Shannon founded the field of information theory with his
paper, “A Mathematical Theory of Communication” [5]. In it, he defined the
entropy H of a message as
H = −
∑
i
p
i
log
2
(p
i
)
where p
i
is the probability of the ith character occurring in the message. This
probability can be easily calculated if we count the number of times each
character appears in the message:
p
i
=
number of times ith character appears
length of message
When entropy H is calculated as above, using the log base two, it measures
the average number of bits
1
per character required to communicate the given
message.
Example: Low Entropy
Intuitively, in a string ...