Restrict Searches to Top-Level Results

Separate out search results by the depth at which they appear in a site.

Google’s a mighty big haystack in which to find the needle you seek. And there’s more, so much more: some experts believe that Google and its ilk index only a bare fraction of the pages available on the Web.

Because the Web’s growing all the time, researchers have to come up with lots of different tricks to narrow down search results. Tricks and—thanks to the Google API—tools. This hack separates out search results appearing at the top level of a domain from those beneath.

Why would you want to do this?

  • Clear away clutter when searching for proper names. If you’re searching for general information about a proper name, this is one way to clear out mentions in news stories, etc. For example, the name of a political leader such as Tony Blair might be mentioned in a story without any substantive information about the man himself. But if you limited your results to only those pages on the top level of a domain, you would avoid most of those mention hits.

  • Find patterns in the association of highly ranked domains and certain keywords.

  • Narrow search results to only those bits that sites deem important enough to have in their virtual foyers.

  • Skip past subsites, such as home pages created by J. Random User on his service provider’s web server.

The Code

Save the code as a CGI script ["How to Run the Hacks” in the Preface] named gootop.cgi:

#!/usr/local/bin/perl # gootop.cgi # Separates ...

Get Google Hacks, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.