Summarize Results by Domain
Get an overview of the sorts of domains (educational, commercial, foreign, and so forth) found in the results of a Google query.
You want to know about a topic, so you do a search. But what do you have? A list of pages. You can’t get a good idea of the types of pages these are without taking a close look at the list of sites.
This hack is an attempt to get a snapshot of the types of sites that result from a query. It does this by taking a suffix census, a count of the different domains that appear in search results.
This is most ideal for running
providing a good idea of what kinds of domains (commercial,
educational, military, foreign, etc.) are linking to a particular
You could also run it to see where technical terms, slang terms, and unusual words are turning up. Which pages mention a particular singer more often? Or a political figure? Does the word “democrat” come up more often on .com or .edu sites?
Of course, this snapshot doesn’t provide a complete inventory, but as overviews go, it’s rather interesting.
Save the code as
suffixcensus.cgi, a CGI script
["How to Run the Hacks” in the
Preface] on your web server:
#!/usr/local/bin/perl # suffixcensus.cgi # Generates a snapshot of the kinds of sites responding to a # query. The suffix is the .com, .net, or .uk part. # suffixcensus.cgi is called as a CGI with form input. # Your Google API developer's key. my $google_key='
insert key here'; # Location of the GoogleSearch WSDL ...