We have shown the power of a software development methodology that uses large amounts of data to solve ill-posed problems in uncertain environments. In this chapter it was language data, but many of the same lessons apply to other data.
In the examples we have explored, the programs are simple and succinct because the probabilistic models are simple. These simple models ignore so much of what humans know—clearly, to segment "choosespain.com", we draw on much specific knowledge of how the travel business works and other factors, but the surprising result is that a program does not have to explicitly represent all that knowledge; it gets much of the knowledge implicitly from the n-grams, which reflect what other humans have chosen to talk about. In the past, probabilistic models were more complex because they relied on less data.
There was an emphasis on statistically sophisticated forms of smoothing when data is missing. Now that very large corpora are available, we use approaches like stupid backoff and no longer worry as much about the smoothing model.
Most of the complexity in the programs we studied in this chapter was due to the search strategy. We saw three classes of search strategy:
For shift ciphers, there are only 26 candidates; we can test them all.
For segmentation, there are 2n candidates, but most can be proved nonoptimal (given the independence assumption) without examining them.
For full substitution ciphers, we can't ...