Summarize Results by Domain

Get an overview of the sorts of domains (educational, commercial, foreign, and so forth) found in the results of a Google query.

You want to know about a topic, so you do a search. But what do you have? A list of pages. You can’t get a good idea of the types of pages these are without taking a close look at the list of sites.

This hack is an attempt to get a snapshot of the types of sites that result from a query. It does this by taking a suffix census, a count of the different domains that appear in search results.

This is most ideal for running link: queries, providing a good idea of what kinds of domains (commercial, educational, military, foreign, etc.) are linking to a particular page.

You could also run it to see where technical terms, slang terms, and unusual words are turning up. Which pages mention a particular singer more often? Or a political figure? Does the word “democrat” come up more often on .com or .edu sites?

Of course, this snapshot doesn’t provide a complete inventory, but as overviews go, it’s rather interesting.

The Code

Save the code as suffixcensus.cgi, a CGI script ["How to Run the Hacks” in the Preface] on your web server:

#!/usr/local/bin/perl
# suffixcensus.cgi
# Generates a snapshot of the kinds of sites responding to a
# query. The suffix is the .com, .net, or .uk part.
# suffixcensus.cgi is called as a CGI with form input.
     
# Your Google API developer's key.
my $google_key='insert key here'; # Location of the GoogleSearch WSDL ...

Get Google Hacks, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.