Cheating at Word Puzzles
Crossword puzzles give you clues about words, but most of us get stuck when we cannot think of, say, a ten-letter word that begins with a b and has either an x or a z in the seventh position.
Regular-expression pattern matching with awk or grep
is clearly called for, but what files do we search? One good choice is
the Unix spelling dictionary, available as /usr/dict/words, on many systems. (Other
popular locations for this file are /usr/share/dict/words and /usr/share/lib/dict/words.) This is a simple
text file, with one word per line, sorted in lexicographic order. We can
easily create other similar-appearing files from any collection of text
files, like this:
cat file(s) | tr A-Z a-z | tr -c a-z\' '\n' | sort -uThe second pipeline stage converts uppercase to lowercase, the
third replaces nonletters by newlines, and the last sorts the result,
keeping only unique lines. The third stage treats apostrophes as
letters, since they are used in contractions. Every Unix system has
collections of text that can be mined in this way—for example, the
formatted manual pages in /usr/man/cat*/* and /usr/local/man/cat*/*. On one of our systems,
they supplied more than 1 million lines of prose and produced a list of
about 44,000 unique words. There are also word lists for dozens of
languages in various Internet archives.[6]
Let us assume that we have built up a collection of word lists in this way, and we stored them in a standard place that we can reference from a script. ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access