O'Reilly logo

Access Data Analysis Cookbook by Wayne S. Freeze, Ken Bluttman

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Applying Proximate Matching

Problem

At times, it is necessary to look for data that is similar. This could mean words or names that are spelled just a bit differently, or that are the same length, or that start with the same character and are the same length even though the rest of the characters are different. How does one go about coding a routine to find such values?

Solution

Matching items that are similar is as much an art as it is a programming discipline. There are many rules that can be implemented, so it is best to determine your exact needs or expectations of how the data might be similar, and then code appropriately.

Figure 6-23 shows a table containing similar names. This recipe will discuss a few methods to compare each of these with the name Johnson (which just happens to be the first name anyway).

A table of similar names

Figure 6-23. A table of similar names

Discussion

To demonstrate, we'll consider three matching approaches:

  1. The first approach compares the lengths of the two strings and returns a percentage value indicating the closeness of the match. A result of 1 means the strings are exactly the same length; a lower result indicates that the record value is shorter, and a higher result indicates that it's longer.

  2. The second approach returns a count of characters that match at the same position in each string, and the overall percentage of the match.

  3. The third approach returns a 1 or a 0, respectively, ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required