The Original Unix Spellchecking Prototype
Spellchecking has been the subject of more than 300 research papers and books.[1] In his book Programming Pearls,[2] Jon Bentley reported: Steve Johnson wrote the first version of spell in an afternoon in 1975. Bentley then sketched a reconstruction credited to Kernighan and Plauger[3] of that program as a Unix pipeline that we can rephrase in modern terms like this:
preparefilename| Remove formatting commands tr A-Z a-z | Map uppercase to lowercase tr -c a-z '\n' | Remove punctuation sort | Put words in alphabetical order uniq | Remove duplicate words comm -13dictionary- Report words not in dictionary
Here, prepare is a filter that strips
whatever document markup is present; in the simplest case, it is just
cat. We assume the argument syntax
for the GNU version of the tr
command.
The only program in this pipeline that we have not seen
before is comm: it compares two
sorted files and selects, or rejects, lines common to both. Here, with
the -13 option, it outputs only lines from the second
file (the piped input) that are not in the first file (the dictionary).
That output is the spelling-exception report.
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access