Summarizing Results by Domain
Getting an overview of the sorts of domains (educational, commercial, foreign, and so forth) found in the results of a Google query.
You want to know about a topic, so you do a search. But what do you have? A list of pages. You can’t get a good idea of the types of pages these are without taking a close look at the list of sites.
This hack is an attempt to get a “snapshot” of the types of sites that result from a query. It does this by taking a "suffix census,” a count of the different domains that appear in search results.
This is most ideal for running
providing a good idea of what kinds of domains (commercial,
educational, military, foreign, etc.) are linking to a particular
You could also run it to see where technical terms, slang terms, and
unusual words were turning up. Which pages mention a particular
singer more often? Or a political figure? Does the word
“democrat” come up more often on
Of course this snapshot doesn’t provide a complete inventory; but as overviews go, it’s rather interesting.
#!/usr/local/bin/perl # suffixcensus.cgi # Generates a snapshot of the kinds of sites responding to a # query. The suffix is the .com, .net, or .uk part. # suffixcensus.cgi is called as a CGI with form input # Your Google API developer's key my $google_key='insert key here'; # Location of the GoogleSearch WSDL file my $google_wdsl = "./GoogleSearch.wsdl"; # Number of times to loop, retrieving 10 results at ...