11Toy Examples

In this chapter, we will discuss some toy examples of stochastic processes that can or cannot model some particular statistical phenomena possibly exhibited by natural texts, i.e. texts in natural language. As mentioned in Section 1.15, there are four phenomena pertaining to natural texts which we may consider first:

  • The strictly positive Shannon entropy rate:
    (11.1)equation
  • The power‐law growth of block mutual information:
    (11.2)equation
  • The power‐law logarithmic growth of maximal repetition:
    (11.3)equation
  • The power‐law decay of mutual information between individual symbols:
    (11.4)equation

Respectively, the above laws have been proposed by Shannon (1951), Hilberg (1990), Dębowski (2012b), and Lin and Tegmark (2017).

Intentionally, we have mentioned above only those statistical phenomena that can be expressed using purely information‐theoretic concepts, such as entropy, mutual information, or maximal repetition. It should be noted that quantitative linguists have identified a few other laws, see Köhler et al. (2005). Potentially, for the approach developed in our book, the most interesting ...

Get Information Theory Meets Power Laws now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.