Summarizing Results by Domain

Getting an overview of the sorts of domains (educational, commercial, foreign, and so forth) found in the results of a Google query.

You want to know about a topic, so you do a search. But what do you have? A list of pages. You can’t get a good idea of the types of pages these are without taking a close look at the list of sites.

This hack is an attempt to get a “snapshot” of the types of sites that result from a query. It does this by taking a "suffix census,” a count of the different domains that appear in search results.

This is most ideal for running link: queries, providing a good idea of what kinds of domains (commercial, educational, military, foreign, etc.) are linking to a particular page.

You could also run it to see where technical terms, slang terms, and unusual words were turning up. Which pages mention a particular singer more often? Or a political figure? Does the word “democrat” come up more often on .com or .edu sites?

Of course this snapshot doesn’t provide a complete inventory; but as overviews go, it’s rather interesting.

The Code

#!/usr/local/bin/perl # suffixcensus.cgi # Generates a snapshot of the kinds of sites responding to a # query. The suffix is the .com, .net, or .uk part. # suffixcensus.cgi is called as a CGI with form input # Your Google API developer's key my $google_key='insert key here'; # Location of the GoogleSearch WSDL file my $google_wdsl = "./GoogleSearch.wsdl"; # Number of times to loop, retrieving 10 results at ...

