Chapter 14. Simil: an algorithm to look for similar strings
Tom van Stiphout
Are you a perfect speller? Is everyone in your company? How about your business partners? Misspellings are a fact of life. There are also legitimate differences in spelling: what Americans call rumors, the British call rumours. Steven A. Ballmer and Steve Ballmer are two different but accurate forms of that man’s name. Your database may contain a lot of legacy values from the days before better validation at the point of data entry.
Overall, chances are your database already contains imperfect textual data, which makes it hard to search. Additionally, the user may not know exactly what to look for. When looking for a number or a date, we could search for a range, ...