Entropy A very common event does not carry much information. If we want to catch someone's attention, we have to produce an unexpected effect, something that occurs with very low probability. If we merely splash ink on paper, we produce random blobs which are not very informative. However, if we deposit the ink in the form of letters, it is extremely unlikely that the pattern has appeared spontaneously and thus is able to carry information.
Thus, the less probable an occurrence is, the more information it can convey. If a single event is expected to occur with probability p, its rarity or surprise value can be measured by p−1. This may thus be used to signify the information-carrying capacity of an event occurring with probability p. Such a measure is, however, not very expedient, because if we use two such events, the other one occurring with probability q, the information-carrying capacity of the two events should be the sum of the individual capacities if they are taken to be independent. On the one hand, we have
but regarded as a single combined event, the two consecutive events should give the information capacity 1/pq. This is true only if p + q = 1, which is an exceptional special case. This shortcoming was overcome in 1948 when Shannon introduced his entropy concept,