BUY THIS BOOK

Safari Books Online

What is this?

Looking to Reprint this content?


Google Hacks
Google Hacks 100 Industrial-Strength Tips & Tricks

By Tara Calishain, Rael Dornfest

Cover | Table of Contents | Colophon


Table of Contents

Chapter 1: Searching Google
Google's front page is deceptively simple: a search form and a couple of buttons. Yet that basic interface—so alluring in its simplicity—belies the power of the Google engine underneath and the wealth of information at its disposal. And if you use Google's search syntax to its fullest, the Web is your research oyster.
But first you need to understand what the Google index isn't.
The Internet is not a library. The library metaphor presupposes so many things—a central source for resource information, a paid staff dutifully indexing new material as it comes in, a well-understood and rigorously adhered-to ontology—that trying to think of the Internet as a library can be misleading.
Let's take a moment to dispel some of these myths right up front.
  • Google's index is a snapshot of all that there is online. No search engine—not even Google—knows everything. There's simply too much and its all flowing too fast to keep up. Then there's the content Google notices but chooses not to index at all: movies, audio, Flash animations, and innumerable specialty data formats.
  • Everything on the Web is credible. It's not. There are things on the Internet that are biased, distorted, or just plain wrong—whether intentional or not. Visit the Urban Legends Reference Pages (http://www.snopes.com/) for a taste of the kinds of urban legends and other misinformation making the rounds of the Internet.
  • Content filtering will protect you from offensive material. While Google's optional content filtering is good, it's certainly not perfect. You may well come across an offending item among your search results.
  • Google's index is a static snapshot of the Web. It simply cannot be so. The index, as with the Web, is always in flux. A perpetual stream of spiders deliver new-found pages, note changes, and inform of pages now gone. And the Google methodology itself changes as its designers and maintainers learn. Don't get into a rut of searching a particular way; to do so is to deprive yourself of the benefit of Google's evolution.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Hacks #1-28
Google's front page is deceptively simple: a search form and a couple of buttons. Yet that basic interface—so alluring in its simplicity—belies the power of the Google engine underneath and the wealth of information at its disposal. And if you use Google's search syntax to its fullest, the Web is your research oyster.
But first you need to understand what the Google index isn't.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
What Google Isn't
The Internet is not a library. The library metaphor presupposes so many things—a central source for resource information, a paid staff dutifully indexing new material as it comes in, a well-understood and rigorously adhered-to ontology—that trying to think of the Internet as a library can be misleading.
Let's take a moment to dispel some of these myths right up front.
  • Google's index is a snapshot of all that there is online. No search engine—not even Google—knows everything. There's simply too much and its all flowing too fast to keep up. Then there's the content Google notices but chooses not to index at all: movies, audio, Flash animations, and innumerable specialty data formats.
  • Everything on the Web is credible. It's not. There are things on the Internet that are biased, distorted, or just plain wrong—whether intentional or not. Visit the Urban Legends Reference Pages (http://www.snopes.com/) for a taste of the kinds of urban legends and other misinformation making the rounds of the Internet.
  • Content filtering will protect you from offensive material. While Google's optional content filtering is good, it's certainly not perfect. You may well come across an offending item among your search results.
  • Google's index is a static snapshot of the Web. It simply cannot be so. The index, as with the Web, is always in flux. A perpetual stream of spiders deliver new-found pages, note changes, and inform of pages now gone. And the Google methodology itself changes as its designers and maintainers learn. Don't get into a rut of searching a particular way; to do so is to deprive yourself of the benefit of Google's evolution.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
What Google Is
The way most people use an Internet search engine is to drop in a couple of keywords and see what turns up. While in certain domains that can yield some decent results, it's becoming less and less effective as the Internet gets larger and larger.
Google provides some special syntaxes to help guide its engine in understanding what you're looking for. This section of the book takes a detailed look at Google's syntax and how best to use it. Briefly:
Within the page
Google supports syntaxes that allow you to restrict your search to certain components of a page, such as the title or the URL.
Kinds of pages
Google allows you to restrict your search to certain kinds of pages, such as sites from the educational (EDU) domain or pages that were indexed within a particular period of time.
Kinds of content
With Google, you can find a variety of file types; for example, Microsoft Word documents, Excel spreadsheets, and PDF files. You can even find specialty web pages the likes of XML, SHTML, or RSS.
Special collections
Google has several different search properties, but some of them aren't as removed from the web index as you might think. You may be aware of Google's index of news stories and images, but did you know about Google's university searches? Or how about the special searches that allow you to restrict your searches by topic, to BSD, Linux, Apple, Microsoft, or the U.S. government?
These special syntaxes are not mutually exclusive. On the contrary, it's in the combination that the true magic of Google lies. Search for certain kinds of pages in special collections or different page elements on different types of pages.
If you get one thing out of this book, get this: the possibilities are (almost) endless. This book can teach you techniques, but if you just learn them by rote and then never apply them, they won't do you any good
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Google Basics
Generally speaking, there are two types of search engines on the Internet. The first is called the searchable subject index. This kind of search engine searches only the titles and descriptions of sites, and doesn't search individual pages. Yahoo! is a searchable subject index. Then there's the full-text search engine, which uses computerized "spiders" to index millions, sometimes billions, of pages. These pages can be searched by title or content, allowing for much narrower searches than searchable subject index. Google is a full-text search engine.
Whenever you search for more than one keyword at a time, a search engine has a default method of how to handle that keyword. Will the engine search for both keywords or for either keyword? The answer is called a Boolean default; search engines can default to Boolean AND (it'll search for both keywords) or Boolean OR (it'll search for either keyword). Of course, even if a search engine defaults to searching for both keywords (AND) you can usually give it a special command to instruct it to search for either keyword (OR). But the engine has to know what to do if you don't give it instructions.
Google's Boolean default is AND; that means if you enter query words without modifiers, Google will search for all of them. If you search for:
snowblower Honda "Green Bay"
Google will search for all the words. If you want to specify that either word is acceptable, you put an OR between each item:
snowblower OR snowmobile OR "Green Bay"
If you want to definitely have one term and have one of two or more other terms, you group them with parentheses, like this:
snowblower (snowmobile OR "Green Bay")
This query searches for the word "snowmobile" or phrase "Green Bay" along with the word "snowblower." A stand-in for OR borrowed from the computer programming realm is the | (pipe) character, as in:
snowblower (snowmobile | "Green Bay")
If you want to specify that a query item must not appear in your results, use a
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
The Special Syntaxes
In addition to the basic AND, OR, and quoted strings, Google offers some rather extensive special syntaxes for honing your searches.
Google being a full-text search engine, it indexes entire web pages instead of just titles and descriptions. Additional commands, called special syntaxes, let Google users search specific parts of web pages or specific types of information. This comes in handy when you're dealing with 2 billion web pages and need every opportunity to narrow your search results. Specifying that your query words must appear only in the title or URL of a returned web page is a great way to have your results get very specific without making your keywords themselves too specific.
Some of these syntaxes work well in combination. Others fare not quite as well. Still others do not work at all. For detailed discussion on what does and does not mix, see [Hack #8].
intitle:
intitle: restricts your search to the titles of web pages. The variation, allintitle: finds pages wherein all the words specified make up the title of the web page. It's probably best to avoid the allintitle: variation, because it doesn't mix well with some of the other syntaxes.
intitle:"george bush"
allintitle:"money supply" economics
inurl:
inurl: restricts your search to the URLs of web pages. This syntax tends to work well for finding search and help pages, because they tend to be rather regular in composition. An allinurl: variation finds all the words listed in a URL but doesn't mix well with some other special syntaxes.
inurl:help
allinurl:search help
intext:
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Advanced Search
The Google Advanced Search goes well beyond the capabilities of the default simple search, providing a powerful fill-in form for date searching, filtering, and more.
Google's default simple search allows you to do quite a bit, but not all. The Google Advanced Search (http://www.google.com/advanced_search?hl=en) page provides more options such as date search and filtering, with "fill in the blank" searching options for those who don't take naturally to memorizing special syntaxes.
Most of the options presented on this page are self-explanatory, but we'll take a quick look at the kinds of searches that you really can't do with any ease using the simple search's single text-field interface.
Because Google uses Boolean AND by default, it's sometimes hard to logically build out the nuances of just the query you're aiming for. Using the text boxes at the top of the Advanced Search page, you can specify words that must appear, exact phrases, lists of words, at least one of which must appear, and words to be excluded.
Using the Language pull-down menu, you can specify what language all returned pages must be in, from Arabic to Turkish.
Google's Advanced Search further gives you the option to filter your results using SafeSearch. SafeSearch filters only explicit sexual content (as opposed to some filtering systems that filter pornography, hate material, gambling information, etc.). Please remember that machine filtering isn't 100% perfect.
The file format option lets you include or exclude several different Microsoft file formats, including Word and Excel. There are a couple of Adobe formats (most notably PDF) and Rich Text Format as options here too. This is where the Advanced Search is at its most limited; there are literally dozens of file formats that Google can search for, and this set of options represents only a small subset.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Setting Preferences
Customize the way you search Google.
Google's preferences provide a nice, easy way to set your searching preferences from this moment forward.
You can set your Interface Language, affecting the language in which tips and messages are displayed. Language choices range from Afrikaans to Welsh, with plenty of odd options including Bork Bork Bork! (the Swedish Chef), Elmer Fudd, and Pig Latin thrown in for fun. Not to be confused with Interface Language, Search Language restricts what languages should be considered when searching Google's page index. The default being any language, you could be interested only in web pages written in Chinese and Japanese, or French, German, and Spanish—the combination is up to you. Figure 1-1 shows the page through which you can set your language preferences.
Figure 1-1: Language Tools page
Google's SafeSearch filtering affords you a method of avoiding search results that may offend your sensibilities. The default is no filtering. Moderate filtering rules out explicit images, but not explicit language. Strict filtering filters both on text and images.
Google, by default, displays 10 results per page. For more results, click any of the "Result Page: 1 2 3..." links at the bottom of each result page, or simply click the "Next" link.
You can specify your preferred number of results per page (10, 20, 30, 50, 100) along with whether you want results to open up in the current or a new browser window.
For the purpose of research, it's best to have as many search results as possible on the page. Because it's all text, it doesn't take that much longer to load 100 results than it does 10. If you have a computer with a decent amount of memory, it's also good to have search results open in a new window; it'll keep you from losing your place and leave you a window with all the search results constantly available.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Language Tools
While you shouldn't rely on Google's language tools to do 100% accurate translations of web pages, they can help you in your searches.
In the early days of the Web, it seemed like most web pages were in English. But as more and more countries have come online, materials have become available in a variety of languages—including languages that don't originate with a particular country (such as Esperanto and Klingon).
Google offers several language tools, including one for translation and one for Google's interface. The interface option is much more extensive than the translation option, but the translation has a lot to offer.
The language tools are available by clicking "Language Tools" on the front page or by going to http://www.google.com/language_tools?hl=en.
The first tool allows you to search for materials from a certain country and/or in a certain language. This is an excellent way to narrow your searches; searching for French pages from Japan gives you far fewer results than searching for French pages from France. You can narrow the search further by searching for a slang word in another language. For example, search for the English slang word "bonce" on French pages from Japan.
The second tool on this page allows you to translate either a block of text or an entire web page from one language to another. Most of the translations are to and from English.
Machine translation is not nearly as good as human translation, so don't rely on this translation as either the basis of a search or as a completely accurate translation of the page you're looking at. Rely on it instead to give you the "gist" of whatever it translates.
You don't have to come to this page to use the translation tools. When you enter a search, you'll see that some search results that aren't in your language of choice (which you set via Google's preferences) have "[Translate this page]" next to their titles. Click on one of those and you'll be presented with a framed, translated version of the page. The Google frame, at the top, gives you the option of viewing the original version of the page, as well as returning to the results or viewing a copy suitable for printing.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Anatomy of a Search Result
Going beyond the obvious in reading Google search results.
You'd think a list of search results would be pretty straightforward, wouldn't you—just a page title and a link, possibly a summary? Not so with Google. Google encompasses so many search properties and has so much data at its disposal that it fills every results page to the rafters. Within a typical search result you can find sponsored links, ads, links to stock quotes, page sizes, spelling suggestions, and more.
By knowing more of the nitty gritty details of what's what in a search result, you'll be able to make some guesses ("Wow, this page that links to my page is very large; perhaps it's a link list") and correct roadblocks ("I can't find my search term on this page; I'll check the version Google has cached"). Furthermore, if you have a good idea what Google provides on its standard search results page, you'll have more of an idea of what's available to you via the Google API.
Let's use the word "flowers" to examine this anatomy. Figure 1-2 shows the result page for flowers.
Figure 1-2: Result page for "flowers"
First, you'll note at the top of the page is a selection of tabs, allowing you to repeat your search across other Google searches, including Google Groups [Hack #30], Google Images [Hack #31], and the Google Directory. Beneath that you'll see a count for the number of results and how long the search took.
Sometimes you'll see results/sites called out on colored backgrounds at the top or right of the results page. These are called "sponsored links" (read: advertisements). Google has a policy of very clearly distinguishing ads and sticking only to text-based advertising rather than throwing flashing banners in your face like many other sites do.
Beneath the sponsored links you'll sometimes see a category list. The category for flowers is Shopping
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Specialized Vocabularies: Slang and Terminology
Your choice of words can make a big difference to the search results you get with Google.
When a teenager says something is "phat," that's slang—a specialized vocabulary for a certain section of the world culture. When a copywriter scribbles "stet" on an ad, that's not slang, but it's still specialized vocabulary for a certain section of the world culture—in this case, the advertising industry.
We have distinctive speech patterns that are shaped by our educations, our families, and where we live. Further, we may use another set of words based on our occupation.
Being aware of these specialty words can make all the difference in the world when it comes to searching. Adding specialized words to your search query—whether slang or industry vocabulary—can really change the slant of your search results.
Slang gives you one more way to break up your search engine results into geographically distinct areas. There's some geographical blurriness when you use slang to narrow your search engine results, but it's amazing how well it works. For example, search Google for football. Now search for football bloke. Totally different results set, isn't it? Now search for football bloke bonce. Now you're into soccer narratives.
Of course, this is not to say that everyone in England automatically uses the word "bloke" any more than everyone in the southern U.S. automatically uses the word "y'all." But adding well-chosen bits of slang (which will take some experimentation) will give a whole different tenor to your search results and may point you in unexpected directions. You can find slang from the following resources:
The Probert Encyclopedia—Slang
http://www.probertencyclopaedia.com/slang.htm
This site is browseable by first letter or searchable by keyword. (Note that the keyword search covers the entire Probert Encyclopedia—slang results are near the bottom.) Slang is from all over the world. It's often crosslinked, especially drug slang. As with most slang dictionaries, this site will contain materials that might offend.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Getting Around the 10 Word Limit
There are some clever ways around Google's limit of 10 words to a query.
Unless you're fond of long, detailed queries, you might never have noticed that Google has a hard limit of 10 words—that's keywords and special syntaxes combined—summarily ignoring anything beyond. While this has no real effect on casual Google users, search-hounds quickly find this limit rather cramps their style.
Whatever shall you do?
By limiting your query to the more obscure of your keywords or phrase fragments, you'll hone results without squandering precious query words. Let's say you're interested in a phrase from Hamlet: "The lady doth protest too much, methinks." At first blush, you might simply paste the entire phrase into the query field. But that's seven of your 10 allotted words right there, leaving no room for additional query words or search syntax.
The first thing to do is ditch the first couple of words; "The lady" is just too common a phrase. This leaves the five word "doth protest too much, methinks." Neither "methinks" nor "doth" are words you might hear every day, providing a nice Shakespearean anchor for the phrase. That said, one or the other should suffice, leaving the query at an even four words with room to grow:
"protest too much methinks"
or:
"doth protest too much"
Either of these will provide you, within the first five results, origins of the phrase and pointers to more information.
Unfortunately, this technique won't do you much good in the case of "Do as I say not as I do," which doesn't provide much in the way of obscurity. Attempt clarification by adding something like quote origin English usage and you're stepping beyond the ten-word limit.
Help comes in the form of Google's full-word wildcard [Hack #13]. It turns out that Google doesn't count wildcards toward the limit.
So when you have more than 10 words, substitute a wildcard for common words like so:
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Word Order Matters
Rearranging your query can have quite an effect.
Who would have thought it? The order in which you put your keywords in a Google query can be every bit as important as the query words themselves. Rearranging a query can change not only your overall result count but also what results rise to the top. While one might expect this of quote-enclosed phrases—"have you any wool" versus "wool you any have"—it may come as a surprise that it also affects sets of individual query words.
Google does warn you of this right up front: "Keep in mind that the order in which the terms are typed will affect the search results." Yet it provides little in the way of explanation or suggestion as to how best to formulate a query to take full advantage of this fact.
A little experimentation is definitely in order.
Search for the words (but not as a quote-enclosed phrase) hey diddle diddle. Figure 1-4 shows the results.
Figure 1-4: Result page for "hey diddle diddle"
The top results, as expected, do include the phrase "hey diddle diddle."
Now give diddle hey diddle a whirl. Again, it should come as no surprise that the first result contains the phrase "diddle hey diddle." Figure 1-5 shows the results.
Figure 1-5: Result page for "diddle hey diddle"
Finally, search for diddle diddle hey (Figure 1-6).
Figure 1-6: Result page for "diddle diddle hey"
Another set of results, though this time it isn't clear that Google is finding the phrase "diddle diddle hey" first. (It does show up in the third result's snippet.)
It appears that even if you don't specify a search as a phrase, Google accords any occurrence of the words as a phrase greater weight and more prominence. This is followed by measures of adjacency between the words and then, finally, the weights of the individual words themselves.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Repetition Matters
Repetition matters when it comes to keywords weighting your queries.
Using keywords multiple times can have an impact on the types and number of results you get.
Don't believe me? Try searching for internet. At the time of this writing Microsoft was the first result. Now try searching for internet internet. At this writing Yahoo! popped to the top. Experiment with this using other words, putting additional query words in if you want to. You'll see that multiple query words can have an impact on how the search results are ordered and in the number of results returned.
Google doesn't talk about this on their web site, so this hack is the result of some conjecture and much experimentation.
First, enter a word one time. Let's use clothes as an example (Figure 1-7). This returns 7,050,000 results, the top being a site called "The Emperor's New Clothes." Let's add another clothes to the query (Figure 1-8). The number of results drops dramatically to 3,490,000, and the first result is for a clothing store. Some different finds move their way up into the top 10 results.
Figure 1-7: Result page for "clothes"
Figure 1-8: Result page for "clothes clothes"
Why stop now? Try clothes clothes clothes (Figure 1-9). The result order and results themselves remain the same.
Figure 1-9: Result page for "clothes clothes clothes"
Here's a theory: Google searches for as many matches for each word or phrase you specify, stopping when it can't find any more. So clothes clothes returns pages with two occurrences of the word "clothes."
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Mixing Syntaxes
What combinations of search syntaxes will and will not fly in your Google search?
There was a time when you couldn't "mix" Google's special syntaxes [Section 1.5]—you were limited to one per query. And while Google released ever more powerful special syntaxes, not being able to combine them for their composite power stunted many a search.
This has since changed. While there remain some syntaxes that you just can't mix, there are plenty to combine in clever and powerful ways. A thoughtful combination can do wonders to narrow a search.
The antisocial syntaxes are the ones that won't mix and should be used individually for maximum effect. If you try to use them with other syntaxes, you won't get any results.
The syntaxes that request special information—stocks: [Hack #18], rphonebook:, bphonebook:, and phonebook: [Hack #17]—are all antisocial syntaxes. You can't mix them and expect to get a reasonable result.
The other antisocial syntax is the link: syntax. The link: syntax shows you which pages have a link to a specified URL. Wouldn't it be great if you could specify what domains you wanted the pages to be from? Sorry, you can't. The link: syntax does not mix.
For example, say you want to find out what pages link to O'Reilly & Associates, but you don't want to include pages from the .edu domain. The query link:www.oreilly.com -site:edu will not work, because the link: syntax doesn't mix with anything else. Well, that's not quite correct. You will get results, but they'll be for the phrase "link www.oreilly.com" from domains that are not .edu.
If you want to search for links and exclude the domain .edu, you have a couple of options. First, you can scrape the list of results [Hack #44] and sort it in a spreadsheet to remove the
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Hacking Google URLs
Hacking the URL Google hands you in response to a search.
When you think of hacks you might think of making a cool search form or performing a particularly intricate search. But you can also hack search results by hacking the URL that Google returns after a search. There's at least one thing you can do by hacking the URL that you can do no other way, and there are quick tricks you can do that might save you a trip back to the advanced preferences page otherwise.
Say you want to search for three blind mice. Your result URL will vary depending on the preferences you've set, but the results URL will look something like this:
http://www.google.com/search?num=100&hl=en&q=%22three+blind+mice%22
The query itself—&q=%22three+blind+mice%22, %22 being a URL-encoded " (double quote)—is pretty obvious, but let's break down what those extra bits mean.
num=100 refers to the number of search results to a page, 100 in this case. Google accepts any number from 1 to 100. Altering the value of num is a nice shortcut to altering the preferred size of your result set without having to meander over to the Advanced Search page and rerun your search.
Don't see the num= in your query? Simply append it to your query URL using any value between 1 and 100.
You can add or alter any of the modifiers described here by simply appending them to the URL or changing their values—the part after the = (equals)—to something within the accepted range for the modifier in question.
hl=en means the language interface—the language in which you use Google, reflected in the home page, messages, and buttons—is in English (at least mine is). Google's Language Tools page [Hack #2] provides a list of language choices. Run your mouse over each and notice the change reflected in the URL; the one for Pig Latin looks like this:
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Hacking Google Search Forms
Build your own personal, task-specific Google search form.
If you want to do a simple search with Google, you don't need anything but the standard Simple Search form (the Google home page). But if you want to craft specific Google searches you'll be using on a regular basis or providing for others, you can simply put together your own personalized search form.
Start with your garden variety Google search form; something like this will do nicely:
<!-- Search Google -->
<form method="get" action="http://www.google.com/search">
<input type="text" name="q" size=31 maxlength=255 value="">
<input type="submit" name="sa" value="Search Google">
</form>
<!-- Search Google -->
This is a very simple search form. It takes your query and sends it directly to Google, adding nothing to it. But you can embed some variables to alter your search as needed. You can do this two ways: via hidden variables or by adding more input to your form.
As long as you know how to identify a search option in Google, you can add it to your search form via a hidden variable. The fact that it's hidden just means that form users will not be able to alter it. They won't even be able to see it unless they take a look at the source code. Let's take a look at a few examples.
While it's perfectly legal HTML to put your hidden variables anywhere between the opening and closing <form> tags, it's rather tidy and useful to keep them all together after all the visible form fields.
File type
As the name suggests, file type specifies filtering your results by a particular file type (e.g., Word DOC, Adobe PDF, PowerPoint PPT, plain text TXT). Add a PowerPoint file type filter, for example, to your search form like so:
<input type="hidden" name="as_filetype" value="PPT">
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Date-Range Searching
An undocumented but powerful feature of Google's search and API is the ability to search within a particular date range.
Before delving into the actual use of date-range searching, there are a few things you should understand. The first is this: a date-range search has nothing to do with the creation date of the content and everything to do with the indexing date of the content. If I create a page on March 8, 1999, and Google doesn't get around to indexing it until May 22, 2002, for the purposes of a date-range search, the date in question is May 22, 2002.
The second thing is that Google can index pages several times, and each time it does so the date on it changes. So don't count on a date-range search staying consistent from day to day. The daterange: timestamp can change when a page is indexed more than one time. Whether it does change depends on whether the content of the page has changed.
Third, Google doesn't "stand behind" the results of a search done using the date-range syntaxes. So if you get a weird result, you can't complain to them. Google would rather you use the date-range options on their advanced search page, but that page allows you to restrict your options only to the last three months, six months, or year.
Why would you want to search by daterange:? There are several reasons:
  • It narrows down your search results to fresher content. Google might find some obscure, out-of-the-way page and index it only once. Two years later this obscure, never-updated page is still turning up in your search results. Limiting your search to a more recent date range will result in only the most current of matches.
  • It helps you dodge current events. Say John Doe sets a world record for eating hot dogs and immediately afterward rescues a baby from a burning building. Less than a week after that happens, Google's search results are going to be filled with John Doe. If you're searching for information on (another) John Doe, babies, or burning buildings, you'll scarcely be able to get rid of him.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Understanding and Using Julian Dates
Get to know and use Julian Dates.
Date-based searching good! Date-based searching with Julian dates annoying (for a human, anyway)!
The Julian date is the number of days that have passed since January 1, 4713 BC. Unlike Gregorian dates, which begin at midnight, Julian days begin at noon, making them useful for astronomers.
A Julian date is just one number. It's not broken up into month, day, and year. That makes it problematic for humans but handy for computer programming, because to change dates, you simply have to add and subtract from one number, and not worry about month and year changes.
To use Google's date-range syntax in Perl, you'll need a way to convert the computer's local time to Julian. You can use the module Time::JulianDay, which offers a variety of ways to manipulate local time in Julian format. You can get the module and more information at http://search.cpan.org/search?query=Time%3A%3AJulianDay.
Hacks that use the Julian date format and date-range searching pop up throughout this book; start by learning more about using the date-range syntax [Hack #11]. Also included are hacks for building recent searches into a customized form [Hack #42], and date-range searches with a client-side application [Hack #60].
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Using Full-Word Wildcards
Google's full-word wildcard stands in for any keyword in a query.
Some search engines support a technique called "stemming." Stemming is adding a wildcard character—usually * (asterisk) but sometimes ? (question mark)—to part of your query, requesting the search engine return variants of that query using the wildcard as a placeholder for the rest of the word at hand. For example, moon* would find: moons, moonlight, moonshot, etc.
Google doesn't support stemming.
Instead, Google offers the full-word wildcard. While you can't have a wildcard stand in for part of a word, you can insert a wildcard (Google's wildcard character is *) into a phrase and have the wildcard act as a substitute for one full word. Searching for "three * mice", therefore, finds: three blind mice, three blue mice, three green mice, etc.
What good is the full-word wildcard? It's certainly not as useful as stemming, but then again, it's not as confusing to the beginner. One * is a stand-in for one word; two * signifies two words, and so on. The full-word wildcard comes in handy in the following situations:
  • Avoiding the 10 word limit [Hack #5] on Google queries. You'll most frequently run into these examples when you're trying to find song lyrics or a quote; plugging the phrase "Fourscore and seven years ago, our forefathers brought forth on this continent" into Google will search only as far as the word "on," every word after that will be ignored by Google.
  • Checking the frequency of certain phrases and derivatives of phrases, like: intitle:"methinks the * doth protest too much" and intitle:"the * of Seville".
  • Filling in the blanks on a fitful memory. Perhaps you remember only a short string of song lyrics; search only using what you remember rather than randomly reconstructed full lines.
Let's take as an example the disco anthem "Good Times" by Chic. Consider the line: "You silly fool, you can't change your fate."
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
inurl: Versus site:
Use inurl: syntax to search site subdirectories.
The site: special syntax is perfect for those situations in which you want to restrict your search to a certain domain or domain suffix like "example.com," "www.example.org," or "edu": site:edu. But it breaks down when you're trying to search for a site that exists beneath the main or default site (i.e., in a subdirectory like /~sam/album/).
For example, if you're looking for something below the main GeoCities site, you can't use site: to find all the pages in http://www.geocities.com/Heartland/Meadows/6485/; Google will return no results. Enter inurl:, a Google special syntax [Section 1.5] for specifying a string to be found in a resultant URL. That query, then, would work as expected like so:
inurl:www.geocities.com/Heartland/Meadows/6485/
While the http:// prefix in a URL is summarily ignored by Google when used with site:, search results come up short when including it in a inurl: query. Be sure to remove prefixes in any inurl: query for the best (read: any) results.
You'll see that using the inurl: query instead of the site: query has two immediate advantages:
  • You can use inurl: by itself without using any other query words (which you can't do with site:).
  • You can use it to search subdirectories.
You can also use inurl: in combination with the site: syntax to get information about subdomains. For example, how many subdomains does O'Reilly.com really have? You can't get that information via the query site:oreilly.com, but neither can you get it just from the query inurl:"*.oreilly.com" (because that query will pick up mirrors and other pages containing the string oreilly.com that aren't at the O'Reilly site).
However, this query will work just fine:
site:oreilly.com inurl:"*.oreilly" -inurl:"www.oreilly" 
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Checking Spelling
Google sometimes takes the liberty of "correcting" what it perceives is a spelling error in your query.
If you've ever used other Internet search engines, you'll have experienced what I call "stupid spellcheck." That's when you enter a proper noun and the search engine suggests a completely ludicrous query ("Elvish Parsley" for "Elvis Presley"). Google's quite a bit smarter than that.
When Google thinks it can spell individual words or complete phrases in your search query better than you can, it'll offer you a suggested "better" search, hyperlinking it directly to a query. For example, if you search for hydrocephelus, Google will suggest that you search instead for hydrocephalus.
Suggestions aside, Google will assume you know of what you speak and return your requested results. Provided, that is, that your query gleaned results.
If your query found no results for the spellings you provided and Google believes it knows better, it will automatically run a new search on its own suggestions. Thus, a search for hydracefallus finding (hopefully) no results will spark a Google-initiated search for hydrocephalus.
Mind you, Google does not arbitrarily come up with its suggestions, but builds them based on its own database of words and phrases found while indexing the Web. If you search for nonsense like garafghafdghasdg, you'll get no results and be offered no suggestions as Figure 1-10 shows.
Figure 1-10: A search that yields no suggestions
This is a lovely side effect and quick and easy way to check the relative frequency of spellings. Query for a particular spelling, making note of the number of results. Then click on Google's suggested spelling and note the number of results. It's surprising how close the counts are sometimes, indicating an oft misspelled word or phrase.
Don't make the mistake of automatically dismissing the proffered results from a misspelled word, particularly a proper name. I've been a fan of cartoonist Bill Mauldin for years now, but I continually misspell his name as "Bill Maudlin." And judging from a quick Google search I'm not the only one. There is no law saying that every page must be spellchecked before it goes online, so it's often worth taking a look at results despite misspellings.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Consulting the Dictionary
Google, in addition to its own spellchecking index, provides hooks into Dictionary.com.
Google's own spellchecking [Hack #15] is built upon its own word and phrase database gleaned while indexing web pages. Thus it provides suggestions for lesser known proper names, phrases, common sentence constructs, etc. Google also offers a definition service powered by Dictionary.com (http://www.dictionary.com/). Definitions, while coming from a credible source and augmented by various specialty indexes, can be more limited.
Run a search. You'll notice on the results page the phrase "Searched the web for [query words]." If the query words would appear in a dictionary, they will be hyperlinked to a dictionary definition. Identified phrases will be linked as a phrase; for example, the query "jolly roger" will allow you to look up the phrase "jolly roger." On the other hand, the phrase "computer legal" will allow you to look up the separate words "computer" and "legal."
The definition search will sometimes fail on obscure words, very new words, slang, and technical vocabularies (otherwise known as specialized slang). If you search for a word's meaning and Google can't help you, try enlisting the services of a metasearch dictionary, like OneLook (http://www.onelook.com/) which indexes over 4 million words in over 700 dictionaries. If that doesn't work, try Google again with one of the following tricks, queryword being the word you want to find:
  • If you're searching for several words—you're reading a technical manual, for example—search for several of the words at the same time. Sometimes you'll find a glossary this way. For example, maybe you're reading a book about marketing, and you don't know many of the words. If you search for storyboard stet SAU, you'll get only a few search results, and they'll all be glossaries.
  • Try searching for your word and the word glossary; say, stet glossary
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Consulting the Phonebook
Google makes an excellent phonebook, even to the extent of doing reverse lookups.
Google combines residential and business phone number information and its own excellent interface to offer a phonebook lookup that provides listings for businesses and residences in the United States. However, the search offers three different syntaxes, different levels of information provide different results, the syntaxes are finicky, and Google doesn't provide any documentation.
Google offers three ways to search its phonebook:
phonebook
Searches the entire Google phonebook
rphonebook
Searches residential listings only
bphonebook
Searches business listings only
The result page for phonebook: lookups lists only five results, residential and business combined. The more specific rphonebook: and bphonebook: searches provide up to 30 results per page. For more chance of finding what you're looking for, use the appropriate targetted lookup.
Using a standard phonebook requires knowing quite a bit of information about what you're looking for: first name, last name, city, and state. Google's phonebook requires no more than last name and state to get it started. Casting a wide net for all the Smiths in California is as simple as:
phonebook:smith ca 
Try giving 411 a whirl with that request! Figure 1-11 shows the results of the query.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Tracking Stocks
A well-crafted Google query will usually net you company information beyond those provided by traditional stock services.
Among the lesser-known pantheon of Google syntaxes is stocks:. Searching for stocks: symbol, where symbol represents the stock you're looking for, will redirect you to Yahoo! Finance (http://finance.yahoo.com/) for details. The Yahoo! page is actually framed by Google; off to the top-left is the Google logo, along with links to Quicken, Fool.com, MSN MoneyCentral, and other financial sites.
Feed Google a bum stock: query and you'll still find yourself at Yahoo! Finance, usually staring at a quote for stock you've never even heard of or a "Stock Not Found" page. Of course, you can use this to your advantage. Enter stocks: followed by the name of a company you're looking for (e.g., stocks:friendly). If the company's name is more than one word, choose the most unique word. Run your query and you'll arrive at the Yahoo! Finance stock lookup page shown in Figure 1-12.
Figure 1-12: Yahoo! Finance stock lookup page
Notice the "Look up: FRIENDLY" link; click it and you'll be offered a list of companies that match "friendly" in some way. From there you can get the stock information you want (assuming the company you wanted is on the list).
Google isn't particularly set up for basic stock research. You'll have to do your initial groundwork elsewhere, returning to Google armed with a better understanding of what you're looking for. I recommend going straight to Yahoo! Finance (http://finance.yahoo.com) to quickly look up stocks by symbol or company name; there you'll find all the basics: quotes, company profiles, charts, and recent news. For more in-depth coverage, I heartily recommend Hoovers (http://www.hoovers.com). Some of the information is free. For more depth, you'll have to pay a subscription fee.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Google Interface for Translators
Create a customized search form for language translation.
If you do a lot of the same kind of research every day, you might find that a customized search form makes your job easier. If you spend enough time on it, you may find that it's elaborate enough that other people may find it useful as well.
WWW Search Interfaces for Translators (http://www.multilingual.ch) offers three different tools for finding material of use to translators. Created by Tanya Harvey Ciampi from Switzerland, the tools are available in AltaVista and Google flavors. A user-defined query term is combined with a set of specific search criteria to narrow down the search to yield highly relevant results.
The first tool, shown in Figure 1-14, finds glossaries. The pull-down menu finds synonyms of the word "glossary" in various parts of a search result (title, URL, or anywhere). For example, imagine having to seek out numerous specialized computer dictionaries before finding one containing a definition of the term "firewall." This glossary search tool spares you the work by setting a clear condition: "Find a glossary that contains my term!"
Figure 1-14: WWW Search Interfaces for Translators glossary tool
If you're getting too many results for the glossary word you searched for, try searching for it in the title of the results instead; instead of searching for firewall, try searching for intitle:firewall.
The second tool, shown in Figure 1-15, finds "parallel texts," identical pages in two or more languages, useful for multilingual terminology research.
Figure 1-15: WWW Search Interfaces for Translators parallel text tool
Finding pages in two or more languages is not easy; one of the few places to do it easily is with Canadian government pages, which are available in French and English. This tool provides several difference search combinations between SL (source language) and TL (target language).
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Searching Article Archives
Google serves as a handy searchable archive for back issues of online publications.
Not all sites have their own search engines, and even the ones that do are sometimes difficult to use. Complicated or incomplete search engines are more pain than gain when attempting to search through archives of published articles. If you follow a couple of rules, Google is handy for finding back issues of published resources.
The trick is to use a common phrase to find the information you're looking for. Let's use the New York Times as an example.