Summarize Results by Domain

Get an overview of the sorts of domains (educational, commercial, foreign, and so forth) found in the results of a Google query.

You want to know about a topic, so you do a search. But what do you have? A list of pages. You can’t get a good idea of the types of pages these are without taking a close look at the list of sites.

This hack is an attempt to get a snapshot of the types of sites that result from a query. It does this by taking a suffix census, a count of the different domains that appear in search results.

This is most ideal for running link: queries, providing a good idea of what kinds of domains (commercial, educational, military, foreign, etc.) are linking to a particular page.

You could also run it to see where technical terms, slang terms, and unusual words are turning up. Which pages mention a particular singer more often? Or a political figure? Does the word “democrat” come up more often on .com or .edu sites?

Of course, this snapshot doesn’t provide a complete inventory, but as overviews go, it’s rather interesting.

The Code

Save the code as suffixcensus.cgi, a CGI script ["How to Run the Hacks” in the Preface] on your web server:

#!/usr/local/bin/perl
# suffixcensus.cgi
# Generates a snapshot of the kinds of sites responding to a
# query. The suffix is the .com, .net, or .uk part.
# suffixcensus.cgi is called as a CGI with form input.
     
# Your Google API developer's key.
my $google_key='insert key here'; # Location of the GoogleSearch WSDL ...

Get Google Hacks, 2nd Edition now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.