Scraping the Google Phonebook

Create a comma-delimited file from a list of phone numbers returned by Google.

Just because Google’s API doesn’t support the phonebook: [Hack #17] syntax doesn’t mean that you can’t make use of Google phonebook data.

This simple Perl script takes a page of Google phonebook: results and produces a comma-delimited text file suitable for import into Excel or your average database application. The script doesn’t use the Google API, though, because the API doesn’t yet support phonebook lookups. Instead, you’ll need to run the search in your trusty web browser and save the results to your computer’s hard drive as an HTML file. Point the script at the HTML file and it’ll do its thing.

Which results should you save? You have two choices depending on which syntax you’re using:

  • If you’re using the phonebook: syntax, save the second page of results, reached by clicking the “More business listings...” or “More residential listings...” links on the initial results page.

  • If you’re using the bphonebook: or rphonebook: syntax, simply save the first page of results. Depending on how many pages of results you have, you might have to run the program several times.

Because this program is so simple, you might be tempted to plug this code into a program that uses LWP::Simple to automatically grab result pages from Google, automating the entire process. You should know that accessing Google with automated queries outside of the Google API is against their Terms of Service.

The Code

# phonebook2csv
# Google Phonebook results in CSV suitable for import into Excel
# Usage: perl < results.html > results.csv

# CSV header
print qq{"name","phone number","address"\n};

my @listings = split /<hr size=1>/, join '', <>;

foreach (@listings[1..($#listings-1)]) {
        s!\n!!g; # drop spurious newlines
        s!<.+?>!!g; # drop all HTML tags
        s!"!""!g; # double escape " marks
        print '"' . join('","', (split /\s+-\s+/)[0..2]) . "\"\n";

Running the Hack

Run the script from the command line, specifying the phonebook results HTML filename and name of the CSV file you wish to create or to which you wish to append additional results. For example, using results.html as our input and results.csv as our output:

$ perl < results.html > results.csv

Leaving off the > and CSV filename sends the results to the screen for your perusal:

$ perl < results.html
"name","phone number","address"
"John Doe","(555) 555-5555","Wandering, TX 98765"
"Jane Doe","(555) 555-5555","Horsing Around, MT 90909"
"John and Jane Doe","(555) 555-5555","Somewhere, CA 92929"
"John Q. Doe","(555) 555-5555","Freezing, NE 91919"
"Jane J. Doe","(555) 555-5555","1 Sunnyside Street, "Tanning, FL 90210""
"John Doe, Jr.","(555) 555-5555","Beverly Hills, CA 90210"
"John Doe","(555) 555-5555","1 Lost St., Yonkers, NY 91234"
"John Doe","(555) 555-5555","1 Doe Street, Doe, OR 99999"
"John Doe","(555) 555-5555","Beverly Hills, CA 90210"

Using a double >> before the CSV filename appends the current set of results to the CSV file, creating it if it doesn’t already exist. This is useful for combining more than one set of results, represented by more than one saved results page:

$ perl < results_1.html > results.csv
$ perl < results_2.html >> results.csv

Get Google Hacks now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.