May 2017
Beginner
552 pages
28h 47m
English
The egrep command converts the text file into a stream of words, one word per line. The \b[[:alpha:]]+\b pattern matches each word and removes whitespace and punctuation. The -o option prints the matching character sequences as one word in each line.
The awk command counts each word. It executes the statements in the { } block for each line, so we don't need a specific loop for doing that. The count is incremented by the count[$0]++ command, in which $0 is the current line and count is an associative array. After all the lines are processed, the END{} block prints the words and their count.
The body of this procedure can be modified using other tools we've looked at. We can merge capitalized and non-capitalized words into ...