Dawson stemming

This stemmer extends the same approach as the Lovins stemmer with a list of more than a thousand suffixes in the English language. Here is the generic algorithm for the Dawson stemmer:

1. Get the input word2. Get the matching suffix    2a. The suffix pool is reverse indexed by length    2b. The suffix pool is reverse indexed by the last character3. Remove longest suffix from the word with exact match.4. Recode the word using a mapping table5. Convert stem into a valid word.

The advantages of the Dawson stemmer are as follows:

  • It covers a wider range of suffixes and hence produces a more accurate stemming output
  • It is a single-pass algorithm, which makes it efficient

Get Artificial Intelligence for Big Data now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.