
410
|
Chapter 7, Names and Places
#81 Clean Up U.S. Addresses
HACK
regular expressions of course! They are ideal for such clean-up operations as
normalizing street types/prefixes/suffixes, dropping extraneous intra-build-
ing indicators, and excising the general detritus of human input.
The Code
The following Perl code was used by Fundrace to clean up each street
address so that it could be successfully geocoded
[Hack #79]. This allowed us to
use the spatial data in thousands and thousands of additional records to
make political money maps. Mostly composed of a series of regular expres-
sions that expand odd abbreviations, it was written with New York City
street-naming idiosyncrasies in mind, but a selection from it should work
anywhere in the U.S:
#!/usr/bin/perl
use strict;
my $addr = shift;
print cleanAddr($addr);
#for turning numerical spellings into digits
our $spelled_nums = {
first => '1st',
second => '2nd',
third => '3rd',
fourth => '4th',
fifth => '5th',
sixth => '6th',
seventh => '7th',
eigth => '8th',
nineth => '9th',
tenth => '10th',
eleventh => '11th',
twelfth => '12th',
thiteenth => '13th',
fourteenth => '14th',
fifteenth => '15th',
sixteenth => '16th',
seventeenth => '17th',
eighteenth => '18th',
nineteenth => '19th',
one => 1,
two => 2,
three => 3,
four => 4,
five => 5,
six => 6,
seven => 7,
eight => 8,
nine => 9,
ten => 10,
};