Program: Soundex Name Comparisons
The difficulties in comparing (American-style)
names inspired the
development of the Soundex algorithm, in which each of a given set of
consonants maps to a particular number. This was apparently devised
for use by the Census Bureau to map similar-sounding names together
on the grounds that in those days many people were illiterate and
could not spell their parents’ names correctly. But it is still
useful today: for example, in a company-wide telephone book
application. The names Darwin and Derwin, for example, map to D650,
and Darwent maps to D653, which puts it adjacent to D650. All of
these are historical variants of the same name. Suppose we needed to
sort lines containing these names together: if we could output the
Soundex numbers at the front of each line, this would be easy. Here
is a simple demonstration of the Soundex
class:
/** Simple demonstration of Soundex. */ public class SoundexSimple { /** main */ public static void main(String[] args) { String[] names = { "Darwin, Ian", "Davidson, Greg", "Darwent, William", "Derwin, Daemon" }; for (int i = 0; i< names.length; i++) System.out.println(Soundex.soundex(names[i]) + ' ' + names[i]); } }
Let’s run it:
> jikes +E -d . SoundexSimple.java > java SoundexSimple | sort D132 Davidson, Greg D650 Darwin, Ian D650 Derwin, Daemon D653 Darwent, William >
As you can see, the Darwin-variant names (including Daemon Derwin[13]) all sort together and are distinct from the Davidson (and Davis, Davies, etc.) ...
Get Java Cookbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.