11Toy Examples
In this chapter, we will discuss some toy examples of stochastic processes that can or cannot model some particular statistical phenomena possibly exhibited by natural texts, i.e. texts in natural language. As mentioned in Section 1.15, there are four phenomena pertaining to natural texts which we may consider first:
- The strictly positive Shannon entropy rate:
(11.1)
- The power‐law growth of block mutual information:
(11.2)
- The power‐law logarithmic growth of maximal repetition:
(11.3)
- The power‐law decay of mutual information between individual symbols:
(11.4)
Respectively, the above laws have been proposed by Shannon (1951), Hilberg (1990), Dębowski (2012b), and Lin and Tegmark (2017).
Intentionally, we have mentioned above only those statistical phenomena that can be expressed using purely information‐theoretic concepts, such as entropy, mutual information, or maximal repetition. It should be noted that quantitative linguists have identified a few other laws, see Köhler et al. (2005). Potentially, for the approach developed in our book, the most interesting ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access