Dawson stemming

This stemmer extends the same approach as the Lovins stemmer with a list of more than a thousand suffixes in the English language. Here is the generic algorithm for the Dawson stemmer:

1. Get the input word2. Get the matching suffix    2a. The suffix pool is reverse indexed by length    2b. The suffix pool is reverse indexed by the last character3. Remove longest suffix from the word with exact match.4. Recode the word using a mapping table5. Convert stem into a valid word.

The advantages of the Dawson stemmer are as follows:

  • It covers a wider range of suffixes and hence produces a more accurate stemming output
  • It is a single-pass algorithm, which makes it efficient

Get Artificial Intelligence for Big Data now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.