Summarize Results by Domain
Get an overview of the sorts of domains (educational, commercial, foreign, and so forth) found in the results of a Google query.
You want to know about a topic, so you do a search. But what do you have? A list of pages. You can’t get a good idea of the types of pages these are without taking a close look at the list of sites.
This hack is an attempt to get a snapshot of the types of sites that result from a query. It does this by taking a suffix census, a count of the different domains that appear in search results.
This is most ideal for running
link
: queries,
providing a good idea of what kinds of domains (commercial,
educational, military, foreign, etc.) are linking to a particular
page.
You could also run it to see where technical terms, slang terms, and unusual words are turning up. Which pages mention a particular singer more often? Or a political figure? Does the word “democrat” come up more often on .com or .edu sites?
Of course, this snapshot doesn’t provide a complete inventory, but as overviews go, it’s rather interesting.
The Code
Save the code as suffixcensus.cgi
, a CGI script
["How to Run the Hacks” in the
Preface] on your web server:
#!/usr/local/bin/perl
# suffixcensus.cgi
# Generates a snapshot of the kinds of sites responding to a
# query. The suffix is the .com, .net, or .uk part.
# suffixcensus.cgi is called as a CGI with form input.
# Your Google API developer's key.
my $google_key='insert key here
'; # Location of the GoogleSearch WSDL ...
Get Google Hacks, 2nd Edition now with O’Reilly online learning.
O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.