Chapter 1. Web

Google’s front page is deceptively simple: a search form and a couple of buttons. Yet that basic interface—so alluring in its simplicity—belies the power of the Google engine underneath and the wealth of information at its disposal. If you use Google’s search syntax to its fullest, the Web is your oyster.

Searching in Google doesn’t have to be a case of just entering what you’re looking for in the search box and hoping for the best. Google offers you many ways—via special syntax and search options—to refine your search criteria and help Google better understand what you’re looking for. We’ll dig into Google’s powerful, all-but-undocumented special syntax and search options, and show how to use them to their fullest. We’ll cover the basics of Google searching, wildcards, word limits, syntax for special cases, mixing syntax elements, advanced search techniques, and using specialized vocabularies, including slang and jargon.

Google Web Search Basics

Whenever you search for more than one keyword at a time, a search engine has a default strategy for handling and combining those keywords. Can those words appear individually anywhere in a page, or do they have to be right next to each other? Will the engine search for both keywords or for either keyword?

Phrase Searches

Google defaults to searching for occurrences of your specified keywords anywhere in the page, whether side by side or scattered throughout. To return the results of pages containing specifically ordered words, enclose them in quotes, turning your keyword search into a phrase search , to use Google’s terminology.

On entering a search for the keywords:

to be or not to be

Google will find matches where the keywords appear anywhere on the page. If you want Google to find you matches where the keywords appear together as a phrase, surround them with quotes, like this:

"to be or not to be"

Google will return matches in which only those words appear together (not to mention explicitly including stop words such as “to” and “or”; see the section “Explicit Inclusion” a little later).

Phrase searches are also useful when you want to find a phrase but aren’t quite sure of the exact wording. This is accomplished in combination with wildcards, explained later in the chapter in “Full-Word Wildcards.”

Basic Boolean

Whether an engine searches for all keywords or any of them depends on what is called its Boolean default . Search engines can default to Boolean AND (searching for all keywords) or Boolean OR (searching for any keywords). Of course, even if a search engine defaults to searching for all keywords, you can usually give it a special command to instruct it to search for any keyword. Lacking specific instructions, the engine falls back on its default setting.

Google’s Boolean default is AND, which means that if you enter query words without modifiers, Google will search for all your query words. For example, if you search for:

snowblower Honda "Green Bay"

Google will search for all the words. If you prefer to specify that any one word or phrase is acceptable, put an OR between each:

snowblower OR snowmobile OR "Green Bay"

Warning

Make sure you capitalize OR; a lowercase or won’t work correctly.

If you want to search for a particular term along with two or more other terms, group the other terms within parentheses, like so:

snowblower (snowmobile OR "Green Bay")

This query searches for the word “snowmobile” or phrase “Green Bay” along with the word “snowblower.” A stand-in for OR, borrowed from the computer-programming realm, is the | (pipe) character, as in:

snowblower (snowmobile | "Green Bay")

Negation

If you want to specify that a query item must not appear in your results, prepend a (minus sign or dash):

snowblower snowmobile -"Green Bay"

This will search for pages that contain both the words “snowblower” and “snowmobile,” but not the phrase “Green Bay.”

Note that the symbol must appear directly before the word or phrase that you don’t want. If there’s space between, as in the following query, it won’t work as expected:

snowblower snowmobile - "Green Bay"

Be sure, however, to place a space before the - symbol.

Explicit Inclusion

On the whole, Google will search for all the keywords and phrases that you specify (with the exception of those you’ve specifically negated with , of course). However, there are certain words that Google will ignore because they are considered too common to be of any use in the search. These words—“I,” “a,” “the,” and “of,” to name a few—are called stop words .

You can force Google to take a stop word into account by prepending a + (plus) character, as in:

+the king

Stop words that appear inside of phrase searches are not ignored. Searching for:

"the move" glam

will result in a more accurate list of matches than:

the move glam

simply because Google takes the word “the” into account in the first example but ignores it in the second.

Synonyms

Every so often, you get the feeling that you’re missing out on some useful results because the keyword or keywords you’ve chosen aren’t the only way to express what you’re looking for.

The Google synonym operator, the ~ (tilde) character, prepended to any number of keywords in your query, asks Google to include not only exact matches, but also what it thinks are synonyms for each of the keywords. Searching for:

~ape

turns up results for monkey, gorilla, chimpanzee, and others (both singular and plural forms) of the ape or related family, as if you’d searched for:

monkey gorilla chimpanzee

along with results for some words you’d never have thought to include in your query.

Google figures out synonyms algorithmically, so you may be surprised to find results that your garden-variety thesaurus would not have suggested. (Synonyms are bolded along with exact keyword matches on the results page, so they’re easy to spot.)

Number Range

One of the more difficult things to convey in an Internet search query is a range—of dates, currency, size, weight, height, or any two arbitrary values.

The number range operator, .. (two periods), looks for results that fall inside your specified numeric range.

Looking for that perfect pair of Prada pumps, size 5 or 6? Try this for size:

prada pumps size 5..6

Perhaps you’re looking to spend $800 to $1,000 on a nice digital SLR camera; Google for:

slr digital camera 3..5 megapixel $800..1000

The one thing to remember is always to provide some clue as to the meaning of the range, e.g., $, size, megapixel, kg, and so forth.

You can also use the number range syntax with just one number, making it the minimum or maximum of your query. Do you want to find some land in Montana that’s at least 500 acres? No problem:

acres Montana land 500..

On the other hand, you might want to make sure that raincoat you buy for your terrier doesn’t cost more than $30. That’s possible too:

raincoat dog ..$30

Tip

Google normally does not recognize special characters such as $ in the search process. But because the $ sign was necessary for the number feature, you can use it in all sorts of searches. Try the search "yard sale" bargains 10 and then "yard sale" bargains $10. Notice how the second search gives you far fewer results? That’s because Google is matching $10 exactly.

Simple Searching and Feeling Lucky

The I’m Feeling Lucky™ button is a thing of beauty. Rather than giving you a list of search results from which to choose, you’re whisked away to what Google believes is the most relevant page given your search (i.e., the first result in the list). Entering washington post and clicking the I’m Feeling Lucky button takes you directly to http://www.washingtonpost.com. Trying president will land you at http://www.whitehouse.gov.

Case Sensitivity

Some search engines are case-sensitive; that is, they search for queries based on how the queries are capitalized. A search for "GEORGE WASHINGTON" on such a search engine would not find “George Washington,” “george washington,” or any other case combination.

Google is case-insensitive. If you search for Three, three, THREE, or even ThrEE, you get the same results.

Full-Word Wildcards

Some search engines support a technique called stemming, in which you add a wildcard character—usually * (asterisk) but sometimes ? (question mark)—to part of your query, requesting the search engine to return variants of that query using the wildcard as a placeholder for the rest of the word. For example, moon* would find moons, moonlight, moonshot, etc.

Google doesn’t support explicit stemming. It didn’t used to support stemming at all, but now it implicitly stems for you. So, canine dietary will yield results for dog diet, diets, and other variations on the theme.

Google does offer a full-word wildcard. While a wildcard can’t stand in for part of a word, you can insert a wildcard (Google’s wildcard character is *) into a phrase, and the wildcard will act as a substitute for one full word. Searching for three * mice, therefore, finds three blind mice, three blue mice, three green mice, etc.

What good is the full-word wildcard? It’s certainly not as useful as stemming, but then again, it’s not as confusing to the beginner. * is a stand-in for one word; ** signifies two words, and so on. The full-word wildcard comes in handy in the following situations:

  • Checking the frequency of certain phrases and derivatives of phrases, such as: intitle:"methinks the * doth protest too much" and intitle: "the * of Seville" (intitle: is described next in “Special Syntax”).

  • Filling in the blanks on a fitful memory. Perhaps you remember only a short string of song lyrics; search using only what you remember rather than randomly reconstructed full lines.

  • Let’s take as an example the disco anthem “Good Times” by Chic. Consider the following line: “You silly fool, you can’t change your fate.”

  • Perhaps you’ve heard that lyric, but you can’t remember if the word “fool” is correct or if it’s something else. If you’re wrong (if the correct line is, for example, “You silly child, you can’t change your fate”), your search will find no results and you’ll come away with the sad conclusion that no one on the Internet has bothered to post lyrics to Chic songs.

  • The solution is to run the query with a wildcard in place of the unknown word, like so:

  • "You silly *, you can't change your fate"
  • You can use this technique for quotes, song lyrics, poetry, and more. You should be mindful, however, to include enough of the quote to find unique results. Searching for "you * fool" will glean far too many irrelevant hits.

Special Syntax

In addition to the basic AND, OR, and phrase searches, Google offers some rather extensive special syntax for narrowing your searches.

As a full-text search engine, Google indexes entire web pages instead of just titles and descriptions. Additional commands, called special syntax , or advanced operators, let Google users search specific parts of web pages for specific types of information. This comes in handy when you’re dealing with more than eight billion web pages and need every opportunity to narrow your search results. Specifying that your query words must appear only in the title or URL of a returned web page is a great way to specify your results without making your keywords themselves too specific. Following are descriptions of the special syntax elements, ordered by common usage and function.

Tip

Some of these syntax elements work well in combination. Others don’t fare quite as well. Still others do not work at all. For a detailed discussion of what does and does not mix, see “Mixing Syntax” later in this chapter.

intitle:

intitle: restricts your search to the titles of web pages. The variation allintitle: finds pages in which all the specified words appear in the title of the web page. Using allintitle: is basically the same as using intitle: before each keyword:

intitle:"george bush"
allintitle:"money supply" economics

You may wish to avoid the allintitle: variation because it doesn’t mix well with some of the other syntax elements.

intext:

intext: searches only body text (i.e., it ignores link text, URLs, and titles). While its uses are limited, it’s perfect for finding query words that might be too common in URLs or link titles:

intext:"yahoo.com"
intext:html

There’s an allintext: variation; but again, this doesn’t play well with others.

inanchor:

inanchor: searches for text in a page’s link anchors. A link anchor is the descriptive text of a link. For example, the link anchor in the HTML code <a href="http://www.oreilly.com">O'Reilly Media</a> is “O’Reilly Media.”

inanchor:"tom peters"

As with other in*: syntax elements, there’s an allinanchor: variation, which works in a similar way (i.e., all the keywords specified must appear in a page’s link anchors).

site:

site: allows you to narrow your search by a site or by a top-level domain. The AltaVista search engine, by contrast, has two syntax elements for this function (host: and domain:), but Google has only the one:

site:loc.gov
site:thomas.loc.gov
site:edu
site:nc.us

Be aware that site: is no good for searching for a page that exists beneath the main or default site (i.e., in a subdirectory such as /~sam/album/). For example, if you’re looking for something below the main GeoCities site, you can’t use site: to find all the pages in http://www.geocities.com/Heartland/Meadows/6485/; Google returns no results. Use inurl: instead.

inurl:

inurl: restricts your search to the URLs of web pages. This syntax usually works well for finding search and help pages because they tend to be regular in composition. An allinurl: variation finds all the words listed in a URL but doesn’t mix well with some other special syntax:

inurl:help
allinurl:search help

You’ll see that using the inurl: query instead of the site: query has one immediate advantage: you can use it to search subdirectories.

Tip

While the http:// prefix in a URL is ignored by Google when used with site:, search results come up short when it is included in an inurl: query. Be sure to remove prefixes in any inurl: query for the best (read: any) results.

link:

link: returns a list of pages that link to the specified URL. Enter link:www.google.com and you’ll get a list of pages that link to the Google home page, http://www.google.com (not anywhere in the google.com domain). Don’t worry about the http:// bit; you don’t need it and, indeed, Google appears to ignore it even if you do put it in. link: works just as well with “deep” URLs—http://www.raelity.org/apps/blosxom/, for instance—as with top-level URLs such as raelity.org.

cache:

cache: finds a copy of the page that Google indexed even if that page is no longer available at its original URL or has since changed its content completely:

cache:www.yahoo.com

If Google returns a result that appears to have little to do with your query, you’re almost sure to find what you’re looking for in the latest cached version of the page at Google.

The Google cache is particularly useful for retrieving a previous version of a page that changes often.

filetype:

filetype: searches the suffixes or filename extensions. These are usually, but not necessarily, different file types; filetype:htm and filetype:html will give you different result counts, even though they’re the same file type. You can even search for different page generators—such as ASP, PHP, CGI, and so forth—presuming the site isn’t hiding them behind redirection and proxying. Google indexes several different Microsoft formats, including PowerPoint (.ppt), Excel (.xls), and Word (.doc):

homeschooling filetype:pdf
"leading economic indicators" filetype:ppt
related:

related: , as you might expect, finds pages that are related to the specified page. This is a good way to find categories of pages; a search for related:google.com returns a variety of search engines, including Lycos, Yahoo!, and Northern Light:

related:www.yahoo.com
related:www.cnn.com

While an increasingly rare occurrence, you’ll find that not all pages are related to other pages.

info:

info: provides a page of links to more information about a specified URL. This information includes a link to the URL’s cache, a list of pages that link to the URL, pages that are related to the URL, and pages that contain the URL:

info:www.oreilly.com
info:www.nytimes.com/technology

Note that this information is dependent on whether Google has indexed the specified URL; if it hasn’t, the information will obviously be far more limited.

phonebook:

phonebook: , as you might expect, looks up phone numbers:

phonebook:John Doe CA
phonebook:(510) 555-1212

The phonebook is covered in detail in “Google Phonebook: Let Google’s Fingers Do the Walking[Hack #5].

define:

define: gives you a page full of definitions of a word from around the Web:

define:paradigm

Google often displays related phrases in addition to definitions and the URLs where the definitions were found.

movie:

Use the movie: syntax to find reviews of movies on the Web, like this:

movie:matrix

You can also use a zip code or a city and state combination to find local theater listings and movie showtimes:

movie:97333
movie:corvallis, or
music:

music: explicitly searches for music-related information:

music:pink floyd

You’re given a page that splits results into matching artists, albums, and lyrics, and you can choose to explore any of these areas in depth.

Mixing Syntax

There was a time when you couldn’t mix Google’s special syntax elements; you were limited to one per query. Even as Google released ever more powerful special syntax elements, not being able to combine them for their composite power stunted many a search.

This has since changed. While there remain some syntax elements that you just can’t mix, there are plenty to combine in clever and powerful ways. A thoughtful combination can do wonders to narrow a search.

How Not to Mix Syntax

There are some simple rules to follow when mixing syntax elements. These, for the most part, revolve around how not to mix:

  • Don’t mix syntax elements that will cancel out each other, such as:

    site:ucla.edu -inurl:ucla

    Here, you’re saying you want all results to come from ucla.edu, but that site results should not have the string “ucla” in the results. Obviously, that’s not going to produce many URLs.

  • Don’t overuse single syntax elements, as in:

    site:com site:edu

    While you might think you’re asking for results from either .com or .edu sites, what you’re actually saying is that site results should come from both simultaneously. Obviously, a single result can come from only one domain. Take the example perl site:edu site:com. This search will get you exactly zero results. Why? Because a result page cannot come from a .edu domain and a .com domain at the same time. If you want results from .edu and .com domains only, rephrase your search like this:

    perl (site:edu | site:com)

    With the pipe character (|), you specify that you want results to come either from the .edu or the .com domain.

  • Don’t use allinurl: or allintitle: when mixing syntax. It takes a careful hand not to misuse these in a mixed search. Instead, stick to inurl: or intitle:. If you don’t put allinurl: in exactly the right place, you’ll create odd search results. Let’s look at an example:

    allinurl:perl intitle:programming

    At first glance, it looks like you’re searching for the string “perl” in the result URL and the word “programming” in the title. And you’re right: this will work fine. But what happens if you move allinurl: to the right of the query?

    intitle:programming allinurl:perl

    This won’t bring any results. Stick to inurl: and intitle:, which are much more forgiving of where you put them in a query.

    The same advice goes for allintext: and allinanchor:.

  • Don’t use so much syntax that you get too narrow, as in:

    title:agriculture site:ucla.edu inurl:search

    You might find that your search is too narrow to give you any useful results. If you’re trying to find something so specific that you think you need a narrow query, start by building a little bit of the query at a time. Say you want to find plant databases at UCLA. Instead of starting with the query

    title:plants site:ucla.edu inurl:database

    try something simpler:

    databases plants site:ucla.edu

    and then try adding syntax to keywords you’ve already established in your search results:

    intitle:plants databases site:ucla.edu

    or:

    intitle:database plants site:ucla.edu

How to Mix Syntax

If you’re trying to narrow down search results, the intitle: and site: syntax elements are your best bet.

Titles and sites

For example, say you want to get an idea of what databases are offered by the state of Texas. Run this search:

intitle:search intitle:records site:tx.us

You’ll find something on the order of 30 very targeted results. And, of course, you can narrow down your search even more by adding keywords:

birth intitle:search intitle:records site:tx.us

It doesn’t seem to matter whether you put plain keywords at the beginning or at the end of the search query; I put them at the beginning because they’re easier to keep up with.

The site: syntax, unlike site syntax on other search engines, allows you to get as general as a domain suffix (site:com) or as specific as a domain or subdomain (site:thomas.loc.gov). So if you’re looking for records in El Paso, you can use this query:

intitle:records site:el-paso.tx.us

and you’ll get approximately one result.

Title and URL

Sometimes you want to find a certain type of information, but you don’t want to narrow by title. Instead, you want to narrow by theme (e.g., you want sites about “help” or about “search engines”). In such cases, you need to search text within the URL.

The inurl: syntax searches for a string in the URL but doesn’t count it if it appears within a larger word. So, for example, if you search for inurl:research, Google will not find pages from http://www.researchbuzz.com, but it will find pages from http://www.research-councils.ac.uk.

Say you want to find information on neurosurgery, with an emphasis on learning or assistance. Try:

intitle:neurosurgery inurl:help

This returns a more manageable 880 or so results. The whole point is to get a number of results that includes what you need but isn’t so large as to be overwhelming. If you find that 880 results are too much, you can easily mix the site: syntax into the search and limit your results to universities:

intitle:neurosurgery inurl:help site:edu

Beware, however, of using too much special syntax. As mentioned earlier, you can quickly detail yourself into no results at all.

The Antisocial Syntax Elements

The antisocial syntax elements don’t mix and should be used individually for maximum effect. If you try to use them with other syntax elements, you won’t get any results.

The syntax elements that request special information—rphonebook: , bphonebook: , movie:, music:, define:, and phonebook:—are all antisocial. That is, you can’t mix them and expect to get a reasonable result.

The other antisocial syntax element is link:, which shows pages that link to a specified URL. Wouldn’t it be great if you could specify the domains you want the pages to be from? Sorry, you can’t. The link: syntax does not mix with anything else—not even plain old keywords.

For example, say you want to find out which pages link to O’Reilly Media, Inc., but you don’t want to include pages from the .edu domain. The query link:www.oreilly.com -site:edu will not work because the link: syntax does not work in combination. Well, that’s not quite correct; you will get results, but they’ll be for the phrase “link:www.oreilly.com” from domains that are not .edu.

If you want to search for links and exclude the .edu domain, there’s no single command that absolutely works. This one’s a good try, though:

inanchor:oreilly -inurl:oreilly -site:edu

This search looks for the word “oreilly” in anchor text, the text that’s used to define links; excludes pages that contain “oreilly” in the search result (e.g., oreilly.com); and, finally, excludes those pages that come from the .edu domain.

But this type of search is nowhere near complete. It finds only those links to O’Reilly that include the string “oreilly”: if someone creates a link such as <a href="http://perl.oreilly.com/">Camel Book</a>, it won’t be found by the preceding query. Furthermore, there are other domains that contain the string “oreilly,” and there may be domains that link to “oreilly” that contain the string “oreilly” but aren’t oreilly.com. You could alter the string slightly to omit the oreilly.com site itself but not other sites containing the string “oreilly”:

inanchor:oreilly -site:oreilly.com -site:edu

However, you would still include many O’Reilly sites—XML.com and MacDevCenter.com, for instance—that aren’t at oreilly.com.

All the Possibilities

While it is possible to write down every syntax-mixing combination and briefly explain how they might be useful, there wouldn’t be room for much else in this book.

Experiment. Experiment a lot. Constantly keep in mind that most of these syntax elements do not stand alone, and you can get more done by combining them than by using them individually.

Depending on the kind of research you are doing, different patterns will emerge over time. For example, you may discover that focusing on only PDF documents (filetype:pdf) finds you the results you need. You may discover that you should concentrate on specific file types in specific domains (filetype:ppt site:tompeters.com). Mix up the syntax in as many ways as is relevant to your research and see what you get.

As with anything else, the more you use Google’s special syntax, the more natural it will become to you. And Google is constantly adding more, much to the delight of regular web combers.

If, however, you want something more structured and visual than a single query line, Google’s Advanced Search should fit the bill.

Advanced Search

Google’s default simple search allows you to do quite a bit, but not everything. Google’s Advanced Search page (http://www.google.com/advanced_search), shown in Figure 1-1, provides more options, such as date search and filtering, with “fill in the blank” searching options for those who don’t take naturally to memorizing special syntax.

Google’s Advanced Search page

Figure 1-1. Google’s Advanced Search page

Most of the options presented on this page are self-explanatory, but we’ll take a quick look at the kinds of searches that would be more difficult using the single-text-field interface of a simple search.

Query Words

Because Google uses Boolean AND by default, it’s sometimes hard to logically build out the nuances of a particular query. Using the text boxes at the top of the Advanced Search page, you can specify words that must appear—exact phrases or lists of words, at least one of which must appear—and words to be excluded.

Language

Using the Language pull-down menu, you can specify the language all returned pages must be in, from Arabic to Turkish.

File Format

The File Format option lets you include or exclude several different file formats, including Microsoft Word and Excel. A couple Adobe formats (most notably PDF) and Rich Text Format are options here, too. This is where the Advanced Search is at its most limited: there are literally dozens of file formats that Google can search for, and this set of options represents only a small subset. To get at the others, use the filetype: special syntax described earlier in “Special Syntax.”

Date

Date allows you to specify search results updated in the last three, six, or twelve months. This date search is much more limited than the daterange: special syntax, which can give you results as narrow as one day, but Google stands behind the results generated using the Date option on the Advanced Search, while not officially sanctioning the use of the daterange: search.

Occurrences

Using the Occurrences pull-down menu, you can specify where the terms should occur. The options here, other than the default, generally reflect the allin*: syntax elements—in the title (allintitle:), text (allintext:), URL (allinurl:), and link anchors (allinanchor:) of the page.

Domain

The Domain feature is an interface to the site: syntax. It also allows negation (explained earlier) to explicitly not return results from a site or domain.

Usage Rights

If you’re looking for materials that you can legally reuse in your reports, presentations, or other compilations, you can specify that you’re looking for materials licensed with alternative copyright systems, such as Creative Commons licenses (http://creativecommons.org). You can look for files that are “free to use or share,” “free to use, share, or modify,” and other variations on this theme.

Safe Search

Google’s Advanced Search also gives you the option to filter your results using SafeSearch. SafeSearch filters only sexually explicit content (as opposed to some filtering systems that filter pornography, hate material, gambling information, etc.). Please remember that machine filtering isn’t 100 percent perfect.

Page-Specific Search

The last two fields in the form provide a simple way to use the related: and link: syntaxes. You can use these special searches to find more information about any specific site.

The Advanced Search page is handy when you need to use its unique features or need help putting together a complicated query. Its “fill-in-the-blank” interface comes in handy for the occasional searcher or anyone who wants to get an advanced search exactly right. That said, it is limiting in other ways. It’s difficult to use mixed syntax or build a single syntax search using OR. For example, there’s no way to search for site:edu OR site:org using the Advanced Search. This search must be done from the Google search box.

Of course, there’s another way you can alter the search results that Google gives you, and it doesn’t involve the basic search input or the Advanced Search page. It’s the preferences page, described in “Setting Preferences” later in this chapter.

Quick Links

If you’re a Google regular, you’ve no doubt noticed those snippets of linked information proliferating near the top-left of the first results page (see Figure 1-2). Where once there was only a sponsored link or two between you and your results, now there are spelling suggestions, news headlines, stock quotes, and all other manner of bits and bobs of rather useful information.

Quick links augmenting search results with relevant, current, and local information

Figure 1-2. Quick links augmenting search results with relevant, current, and local information

Google is going beyond web search results to include relevant finds from its other properties and those of third parties. Here, briefly, is the current catalog of quick links:

Spelling

One nice side effect of Google’s listening to the Web is that it picks up a lot of words along the way. Some appear in the dictionary, while others haven’t quite made their way into common parlance. Some are made up, while others are simply misspelled. Query Google for something that is commonly spelled another way, and it’ll proffer some suggestions. “Consult the Dictionary” delves further into the wonders of Google’s spell checker.

Definitions

TLAs (that’s “three-letter acronyms”) and geek speak abound. Rather than smiling knowingly when you’ve not a clue what someone just said, ask Google if it knows what your friend, boss, or medical professional is talking about. Prepend just about any word, obscure or garden-variety, with define (e.g., define happy), and the first item on your results page will in all probability be a definition pulled from one of any number of web dictionaries. Use define: (note the colon—e.g., define:osteichthyes) to pull up a whole page full of definitions [Hack #6].

News Headlines

Google News (http://news.google.com; see Chapter 3) scrapes stories from thousands of news sources. Don’t be surprised if there’s something new and noteworthy related to your Google search.

Travel Information

Before you hop on that plane, Google your destination using the airport name (e.g., Los Angeles) or code (e.g., LAX) and the word airport. Click the “View conditions at [in this case] Los Angeles International (LAX), Los Angeles, California” link to visit the Federal Aviation Administration’s (FAA) real-time airport status information. At the moment of this writing, LAX has no destination-specific delays, and both departures and arrivals are experiencing fewer than 15-minute gate hold and airborne delays, respectively.

Street Maps

If Google gleans something that looks like a geographic location in your search query, it’ll provide a link to a Google Map pinpointing the location, along with links to Yahoo! and MapQuest maps of the area.

Google Maps

Include the name of a city, state, or zip code anywhere in the U.S. or Canada in your search, and Google Local (http://local.google.com) just might suggest a local find. Google for indian food portland oregon, and you’ll find yourself tempted by the flavors of Swagat Indian Cuisine on NW Lovejoy Street or India Grill on E Burnside.

Calculator

You might remember a few important numbers from math class: pi or e or C, for instance. But numbers hold a very special place in Google’s collective heart; after all, the name Google comes from googol, or 10100. So it shouldn’t come as a surprise that the geeks at Google have taught the search engine to pay attention to particular patterns of numbers, including anything that looks like a calculation. Type any equation into the search form, and you’ll get an answer back:

365/12
9*3

You can also use the Google Calculator to convert units. Simply type out the conversion you want to perform:

12 ounces in pounds
3 meters in yards

Google can also convert currency in the same way. Simply include the two types of currency you’d like to compare:

12 USD in Euros
Google by Numbers, 1-2-3

In addition to calculations, Google looks for special patterns usually found in particular reference numbers, including:

  • UPS, FedEx, and U.S. Postal Service tracking numbers (e.g., 1Z9999W99999999999). Google links to the package service’s tracking page and fills in the number to get you going.

  • Vehicle ID (VIN) numbers (e.g., AAAAA999A9AA99999).

  • UPC codes (e.g., 073333531084) at http://www.upcdatabase.com.

  • Telephone area codes (e.g., 510) at http://www.whitepages.com.

  • Patent numbers (e.g., patent 4920273) in the U.S. Patent Database.

  • Federal Aviation Administration (FAA) airplane registration numbers (e.g., n199ua). These are particularly entertaining when you’re waiting to board your plane, smartphone in hand and “Google on the Go.” Look for them on the plane’s tail.

  • Federal Communications Commission (FCC) equipment ID numbers (e.g., fcc B4Z-34009-PIR).

Stock Quotes

Search for a stock symbol [Hack #16] and you’ll be quick-linked to the company’s financial information at Google Finance, Yahoo! Finance, and a number of other sites that offer stock information.

Froogle Products

If Froogle (http://froogle.google.com) finds a product that seems to be what you’re after, it’ll link to “Product search results” and to two or three offerings at sites such as eBay, Golfsmith, Buy.com, and many more.

Weather

Type in the word weather followed by a city name for a quick look at current conditions and the five-day forecast.

There are sure to be more quick links by the time you read this. To keep apprised of what’s new, periodically visit the Google Web Search Features (http://www.google.com/help/features.html), or just keep Googling and see what appears.

Language Tools

In the early days of the Web, it seemed like most web pages were in English. But as more and more countries have come online, materials have become available in a variety of languages—including languages that have not originated from a particular country (such as Esperanto and Klingon).

Google offers several language tools, including one for translation and one for Google’s interface. The interface option is much more extensive than the translation option, but the translation option has a lot to offer.

The language tools are available by clicking the Language Tools link on the front page or by going to http://www.google.com/language_tools.

Search Specific Languages or Countries

The first tool allows you to search for materials from a certain country and/or in a certain language. This is an excellent way to narrow your searches; searching for French pages from Japan gives you far fewer results than searching for French pages from France. You can narrow the search further by searching for a slang word in another language. For example, search for the English slang word bonce on French pages from Japan.

Translate

The second tool on this page allows you to translate either a block of text or an entire web page from one language to another. Most of the translations are to or from English.

Machine translation is not nearly as good as human translation, so don’t rely on this translation as either the basis of a search or as a completely accurate translation of the page you’re looking at. Instead, use it to get the gist of whatever it translates.

You don’t have to come to this page to use the translation tools. When you enter a search, you’ll see that some search results that aren’t in your language of choice (which you set via Google’s preferences) have “[Translate this page]” next to their titles. Click on one of these and you’ll be presented with a framed, translated version of the page. The Google frame at the top allows you to view the original version of the page, as well as return to the results or view a copy suitable for printing.

Interface Language

The third tool lets you choose the interface language for Google, from Afrikaans to Welsh. Some of these languages are imaginary (Bork, bork, bork! and Elmer Fudd), but they do work.

Warning

Be warned that if you set your language preference to Klingon, for example, you’ll need to know Klingon to figure out how to set it back.

As one of our Google Hacks readers, Jacek Artymiak, pointed out (http://hacks.oreilly.com/pub/h/360), if English is your native tongue, point your browser at http://www.google.com/intl/en. If you’re not an English speaker but remember or care to guess at the language code (e.g., zu for Zulu), drop it in instead of en at the end of the URL. Further discussion revealed that simply suffixing the http://www.google.com URL with a period—http://www.google.com.—has the same delocalizing effect, reverting the interface to English.

If you’re really stuck, delete the Google cookie from your browser and reload the page; this should reset all preferences to the defaults.

How does Google manage to have so many interface languages when it has so few translation languages? The Google in Your Language program gathers volunteers from around the world to translate Google’s interface. (You can get more information on this program at http://www.google.com/intl/en/language.html.)

Local Domain

Finally, the Language Tools page contains a list of region-specific Google home pages—over 100 of them, from Deutschland to the Pitcairn Islands.

Making the Most of Google’s Language Tools

While you shouldn’t rely on Google’s translation tools to give you more than the gist of the meaning (since machine translation isn’t that good), you can use translations to narrow your searches. I described the first method earlier: use unlikely combinations of languages and countries to narrow your results. The second way involves using the translator.

Select a word that matches your topic and use the translator to translate it into another language. (Google’s translation tools work very well for single-word translations like this.) Now, search for that word in a country and language that don’t match it. For example, you might search for the German word “Landstra\xa7 e” (highway) on French pages in Canada. Of course, you must be sure to use words that don’t have English equivalents or you’ll be overwhelmed with results.

Whew! By now it should be fairly clear that a simple interface such as the one on Google’s front page does not necessarily imply limited power. Still waters run deep indeed. Now that we have all the tools, tips, and techniques under our belt to help us ask Google for what we want before it dives into the depths of web content, it’s time to turn our attention to understanding what it brings back to the surface.

Anatomy of a Search Result

You’d think a list of search results would be pretty straightforward, wouldn’t you—just a page title and a link, possibly a summary? Not so with Google. Google encompasses so many search properties and has so much data at its disposal that it fills every results page to the rafters. Within a typical search result, you can find sponsored links, ads, links to stock quotes, page sizes, spelling suggestions, and more.

By knowing more of the nitty-gritty details of what’s what in a search result, you’ll be able to make some guesses (“Wow, this page that links to my page is very large; perhaps it’s a link list”) and correct roadblocks (“I can’t find my search term on this page; I’ll check the version Google has cached”).

Let’s use the word “flowers” to examine this anatomy. Figure 1-3 shows the result page for flowers.

Results page for “flowers”

Figure 1-3. Results page for “flowers”

First, note that at the top of the page a selection of tabs allows you to repeat your search across other Google search categories besides web pages, including Google Images, Google Groups, Google News, Froogle, and Google Maps. Beneath that is a count of the number of results and how long the search took: about 524,000,000 results in 0.14 seconds (this will vary, sometimes by quite a bit).

Sometimes results/sites are called out on colored backgrounds at the top or right of the results page (see Figure 1-3). These are called sponsored links (read: advertisements). Google has a policy of very clearly distinguishing ads and sticking to text-based advertising only rather than throwing flashing banners in your face like other sites do.

You might also see Quick Links for some queries that Google thinks it has a direct answer for, but for the most part you’ll see a list of 10 results. The first real (i.e., nonsponsored) result of the search for flowers is shown in Figure 1-4.

A typical search result

Figure 1-4. A typical search result

Let’s break this down into chunks, shall we?

The top line of each result is the page title, hyperlinked to the original page.

The second line offers a brief extract from this site. Sometimes this is a description of the site or a selected sentence or two. Sometimes it’s HTML mush. Google tends to use description metatags when they’re available; it’s rare when you can look at a Google search result and not have even a modicum of an idea what the site is about.

The next line sports several informative bits of metadata. First, there’s the URL. Second, there’s the size of the page (Google makes the page size available only if the page has been cached). Third, there’s a link to a cached version of the page if one is available. Finally, there’s a link to find similar pages.

Why would you bother reading the search-result metadata? Why not simply visit the site and see if it has what you want?

If you have a broadband connection and all the time in the world, you might not want to bother checking out the metadata. But if you have a slower connection and time is at a premium, consider the search-result information.

First, check the page summary. Where does your keyword appear? Does it appear in the middle of a list of site names? Does it appear in a way that makes it clear that the context is not what you’re looking for?

Check the size of the page if it’s available. Is the page very large? Perhaps it’s just a link list—a page full of hyperlinks, as the name suggests. Is it just 1 or 2 KB? It might be too small to find the level of detail that you’re looking for. If your aim is link lists, be on the lookout for pages larger than 20 KB, and see “Browse the Google Directory” [Hack #1].

Tip

Page size in Google results will never be more than 101 KB. This is because Google doesn’t index more than 101 KB of a given web page.

Setting Preferences

Google’s Preferences page, shown in Figure 1-5, provides a nice, easy way to set and save your search preferences.

Google’s Preferences page

Figure 1-5. Google’s Preferences page

Interface Language

You can set your Interface Language, the language in which tips and messages are displayed.

Search Language

Not to be confused with Interface Language, Search Language restricts the languages that are considered when searching Google’s page index. The default is any language, but you could be interested only in web pages written in Chinese and Japanese, or French, German, and Spanish—the combination is up to you.

SafeSearch Filtering

Google’s SafeSearch filtering affords you a method of avoiding search results that may offend your sensibilities. No filtering means you’re offered anything in the Google index. Moderate filtering rules out explicit images, but not explicit language. Strict filtering filters both text and images. The default is moderate filtering.

Number of Results

By default, Google displays 10 results per page. For more results, click any of the Result Page: 1 2 3... links at the bottom of each result page, or simply click the Next link.

You can specify your preferred number of results per page (10, 20, 30, 50, or 100), along with whether you want results to open in the current window or a new browser window.

Results Window

You can choose to open search results in a new browser window—handy for keeping your search results in place. If you’ve ever clicked from site to site only to find you’ve completely lost the page of results you’d like to return to, try enabling this option.

Settings for Researchers

For the purpose of research, it’s best to have as many search results as possible on the page. Because it’s all text, it doesn’t take that much longer to load 100 results than it does to load 10. If you have a computer with a decent amount of memory, it’s also good to have search results open in a new window, which will keep you from losing your place and leave you a window with all the search results readily available.

If you can stand it, turn off filtering, or at least limit the filtering to moderate instead of strict. Machine filtering is not perfect and, unfortunately, enabling it might mean that you’ll miss something valuable. This is especially true when you’re searching for a phrase that might be caught by a filter, such as “breast cancer.”

Unless you’re absolutely sure you always want to do a search in one language, I advise against setting your language preferences on this page. Instead, alter language preferences as needed using the Google Language Tools [“Language Tools” earlier in this chapter].

Between the simple search, advanced search, and preferences, you have all the tools necessary to build the Google query to suit your particular purposes.

Warning

If cookies are turned off in your browser, setting preferences in Google isn’t going to do you much good. You’ll have to reset them every time you open your browser. If you can’t have cookies and want to use the same preferences every time, consider making a customized search form [Hack #9].

Understanding Google URLs

If you’re like most people, you usually pay little attention to the URLs in your browser’s address bar as you surf from one site to the next. And you might choose to stick with this habit while searching Google. You ought to know, however, that a subtle alteration made to the URL that Google returns after a search can be an efficient method of tweaking your result set. In fact, there’s at least one thing you can do by fiddling with (we like to call it hacking) the URL that you can do no other way, and there are quick tricks that might save you a trip back to the Advanced Search page.

Say you want to search for three blind mice. The URL of the page of results will vary depending on the preferences you’ve set, but it will look something like this:

http://www.google.com/search?num=100&hl=en&q=%22three+blind+mice%22

The query itself—q=%22three+blind+mice%22, %22 being a URL-encoded " (double quote)—is pretty obvious, but let’s break down what those extra bits mean.

num=100 refers to the number of search results per page—100 in this case. Google accepts any number from 1 to 100. Altering the value of num is a nice shortcut to altering the preferred size of your result set without having to meander over to the Advanced Search page and rerun your search.

Don’t see the num= in your URL? Simply append it by clicking at the end of the URL in your browser’s address bar and typing it in. To set the number of results per page to 20, for instance, add &num=20.

Tip

You can add or alter any of the modifiers described here by appending them to the URL or changing their values—the part after the = (equals)—to something within the accepted range for the modifier in question. If you’re adding a modifier, you must use an & (ampersand) too. Look at how the modifiers are joined together on URLs for other search results to see how it’s done.

hl=en refers to the language interface (the language in which you use Google, reflected in the home page, messages, and buttons). Here, it’s in English. Google’s Language Tools [“Language Tools” earlier in this chapter] page provides a list of language choices. Run your mouse over each language choice and notice the change reflected in the URL. The URL for Pig Latin looks like this:

http://www.google.com/intl/xx-piglatin/

The language code is the bit between intl/ and the last /xx-piglatin, in this case. Apply this to the search URL at hand by altering the existing value of hl:

hl=xx-piglatin

What if you put multiple hl modifiers in a result URL? Google honors whichever comes last, reading from left to right. While it makes for confusing URLs, this means you can always resort to laziness and add an extra modifier at the end rather than editing what’s already there, like so:

http://www.google.com/search?num=100&hl=en&q=%22three+blind+mice%22&hl=xx-piglatin

There’s one more modifier that, appended to your URL, may provide some useful modifications of your results:

safe=off

Means the SafeSearch filter is off. The SafeSearch filter removes search results of a sexually explicit nature. safe=on means the SafeSearch filter is on.

Playing about with Google’s URLs [Hack #17] might not seem like the most intuitive way to get results quickly, but it’s much faster than reloading the Advanced Search form.

Browse the Google Directory

Google has a searchable subject index in addition to its Web Search.

Google’s Web Search indexes billions of pages, which means it isn’t suitable for all searches. When you have a search that you can’t narrow down—for example, if you’re looking for information on a person about whom you know nothing—billions of pages will get very frustrating very quickly.

But you don’t have to limit your searches to the Web. Google also has a searchable subject index, the Google Directory, at http://directory.google.com. Instead of indexing the entirety of billions of pages, the directory describes sites instead, indexing about five million URLs. This makes it a much better search for general topics.

Does Google spend time building a searchable subject index in addition to a full-text index? No, Google bases its directory on the Open Directory Project data at http://dmoz.org/. Unlike the results at the standard Google Web Search, the collection of URLs at the Open Directory Project is gathered and maintained by a group of human volunteers rather than automatic algorithms, but Google does add some of its own Googlish magic to it.

As you can see in Figure 1-6, the front of the site is organized into several topics. To find what you’re looking for, you can either do a keyword search or drill down through the hierarchies of subjects.

The Google Directory

Figure 1-6. The Google Directory

Beside most listings, as shown in Figure 1-7, you’ll see a green bar. The green bar is an approximate indicator of the site’s PageRank in the Google search engine. (Not every listing in the Google Directory has a corresponding PageRank in the Google web index.) Web sites are listed in the default order of Google PageRank, but you also have the option to list them in alphabetical order.

Individual listings under Science Physics Quantum Mechanics People Feynman, Richard

Figure 1-7. Individual listings under Science Physics Quantum Mechanics People Feynman, Richard

One thing you’ll notice about the Google Directory is how the annotations and other information vary between categories. This is because the information in the directory is maintained by a small army of thousands of volunteers who are each responsible for one or more categories. For the most part, annotation is pretty good.

Searching Versus Browsing

There are two different kinds of shoppers, and they illustrate the difference between searching and browsing. Some shoppers know exactly what they’re after, and they want to find a store with the item, locate the item, and purchase it as quickly as possible. As with a web search, it helps to know a bit about what you’re looking for if this is your style.

Other shoppers want to explore a particular store, see what the store offers, and choose an item if the right one comes along. This style of browsing is suited for people who want to get a larger survey of items in a particular category before they necessarily know what they’re looking for.

If you were interested in looking at sites about child psychology, you might try a search at http://search.google.com with the query child psychology. You would find thousands of sites in the search results, along with news articles about child psychology, college papers about the topic, and even pages that mention the terms child and psychology without relating to the topic. But browsing the Child Psychology category in the Google Directory (http://directory.google.com/Top/Science/Social_Sciences/Psychology/Child_Psychology/) gives you hundreds of links selected by Open Directory volunteers as being relevant to the topic.

There are still times when you need to search the directory, and Google has provided a couple ways to accomplish this.

Searching the Google Directory

Because the Google Directory is a far smaller collection of URLs, ideal for more general searching, it does not have the various complicated special syntaxes for searching that the Web Search does. However, there are a couple of special syntaxes that you should know about:

intitle:

Just like the Google web special syntax, intitle: restricts the query word search to the title of a page.

inurl:

inurl: restricts the query word search to the URL of a page.

When you’re searching on Google’s web index, your overwhelming concern is probably how to reduce your list of search results to something manageable. With that in mind, you might start with the narrowest possible search.

That’s a reasonable strategy for the web index, but because you have a narrower pool of sites in the Google Directory, you want that search to be more general.

For example, say you were looking for information on author P. G. Wodehouse. A simple search on P. G. Wodehouse in Google’s web index gets you over 1,100,000 results, possibly compelling you to immediately narrow your search. But doing the same search in the Google Directory returns only 176 results. You might consider that a manageable number of results, or you might want to carefully narrow your results further.

The Directory is also good for searching for events. A Google web search for Korean War will find over 24 million results, while searching the Google Directory will find just over 138,000. This is a case where you will probably need to narrow your search. Use general words indicating what kind of information you want—timeline, for example, or archives, or lesson plans. Don’t narrow your search with names or locations; that’s not the best way to use the Google Directory.

Glean a Snapshot of Google in Time

Google Zeitgeist provides a weekly, monthly, and yearly overview of what the Web was interested in.

Turning to Google itself for a definition of zeitgeist (define:zeitgeist), there’s consensus that it refers to “the spirit of the times.” And Google Zeitgeist (http://www.google.com/press/zeitgeist.html) is just that: a mirror that the Web (according to Google) holds up to us, providing a snapshot of the week, month, or year that was.

A typical weekly Google Zeitgeist, shown in Figure 1-8, lists the top 15 gaining queries.

The week’s top 15 gaining queries

Figure 1-8. The week’s top 15 gaining queries

It takes only a few moments of visiting Google Zeitgeist before you’re itching to go back a little further in time: the week your second child was born, the month during which the Olympics were held, the year you graduated from high school. Click the Archive link to choose any year from the Google Zeitgeist Archive and display links such as those shown in Figure 1-9 for every week, month, and year since January 2001.

Tip

Weekly Zeitgeist updates actually started in June 2001, at the same time the monthlies switched from PDF to HTML format. In August 2005, Google stopped listing declining queries and started listing 5 more of the top gaining queries, bringing the total to 15.

The Zeitgest Archive pages, displaying weekly, monthly, and year-end reports dating back to 2001

Figure 1-9. The Zeitgest Archive pages, displaying weekly, monthly, and year-end reports dating back to 2001

Monthly reports provide some information about Google News queries and Google Image Search queries, and you can find monthly reports for countries around the world by clicking the Zeitgeist Around the World link on the front page. Year-end reports provide even more detail with trend graphs.

While Google Zeitgeist’s statistics aren’t earth-shattering (e.g., searches for iraq more than doubled on March 19, 2003, the date that Operation Iraqi Freedom began—imagine that!), it does provide a snapshot of what the world in aggregate found interesting enough to look up.

See Also

  • If Google Zeitgeist piques your interest, you might also try the Yahoo! Buzz Index (http://buzz.yahoo.com), a similar collection of statistics around popular Yahoo! Searches: the day’s top movers (overall and by various Yahoo! categories), most viewed and emailed Yahoo! news items, and a market trend–like chart (click the View Complete Chart... link associated with any of the buzz listings on the front page) of leaders and movers, according to buzz score (http://help.yahoo.com/help/us/buzz/#buzz-04).

  • Google Trends (http://www.google.com/trends) is a new product from the Google Labs that graphs the mentions of words of phrases over time. Type in two words separated by commas to get a quick visual sense of the popularity. For example, “Google, Yahoo” shows you which search engine is mentioned more across time, regions, news stories, and languages.

Visualize Google Results

The TouchGraph Google Browser is the perfect Google complement for those who appreciate visual displays of information.

Some people are born text crawlers. They can retrieve the mostly text resources of the Internet and browse them happily for hours. But others are more visually oriented and find that the flat text results of the Internet leave something to be desired, especially when it comes to search results.

If you’re the type that appreciates visual displays of information, you’re bound to like the TouchGraph Google Browser (http://www.touchgraph.com/TGGoogleBrowser.html). This Java applet allows you to start with pages that are similar to one URL, and then expand outward to pages that are similar to the first set of pages, on and on, until you have a giant map of nodes (a.k.a. URLs) on your screen.

Tip

The TouchGraph Google Browser was created by Alex Shapiro (http://www.touchgraph.com/).

Note that you’re finding URLs that are similar to another URL, just as you would if you used the related: syntax. You aren’t doing a keyword search, and you’re not using the link: syntax. You’re searching by Google’s measure of similarity.

Starting to Browse

Start your journey by entering a URL on the TouchGraph home page and clicking the Graph It link. Your browser will launch the TouchGraph Java applet, covering your window with a large mass of linked nodes, as shown in Figure 1-10.

Mass of linked nodes generated by TouchGraph

Figure 1-10. Mass of linked nodes generated by TouchGraph

Tip

You’ll need a web browser capable of running Java applets. If Java support in your preferred browser comes in the form of a plug-in, your browser should have the smarts to launch a plug-in locator/downloader and walk you through the installation process.

If you’re easily entertained like me, you might amuse yourself for a while just by clicking and dragging the nodes around. But there’s more to do than that.

Expanding Your View

Hold your mouse over one of the items in the group of pages. A little box labeled info pops up. Click on that, and a box of information about that particular node appears, as shown in Figure 1-11.

Node information pop-up box

Figure 1-11. Node information pop-up box

The box of information contains title, snippet, and URL—pretty much everything you’d get from a regular search result. Click on the URL in the box to open that URL’s web page itself in another browser window. If your browser is set to block pop-up windows, you might need to enable them from the touchgraph.com domain.

Not interested in visiting web pages just yet? Want to do some more search visualization? Double-click on one of the nodes. TouchGraph uses the API to request from Google pages similar to the URL of the node you double-clicked. Keep double-clicking at will; when no more pages are available, a green C will appear when you put your mouse over the node (no more than 30 results are available for each node). If you do this often enough, you’ll end up with a screen full of nodes with lines denoting their relationship to one another, as Figure 1-12 shows.

Node mass expanded by double-clicking on nodes

Figure 1-12. Node mass expanded by double-clicking on nodes

Visualization Options

Once you’ve generated similarity page listings for a few different sites, you’ll find yourself with a pretty crowded page. TouchGraph has a few options to change the look of what you’re viewing.

For each node, you can show page title, page URL, or point (the first two letters of the title). If you’re just browsing page relationships, the title is probably best. However, if you’ve been working with the applet for a while and have mapped out a plethora of nodes, the point or URL options can save some space. The URL option removes the www and .com from the URL, leaving the other domain suffixes. For example, www.perl.com shows as perl, while www.perl.org shows as perl.org.

Speaking of saving space, there’s a zoom slider at the upper right of the applet window. After you’ve generated several distinct groups of nodes, zooming out allows you to see the different groupings more clearly. However, it becomes difficult to see relationships between the nodes in the different groups.

To customize the display even further, click the Advanced button to see more TouchGraph options. You’ll find the option to view the singles: the nodes in a group that have a relationship with only one other node. This option is off by default; check the Show Singles checkbox to turn it on. I find it’s better to leave out singles; they crowd the page and make it difficult to establish and explore separate groups of nodes.

The Radius setting specifies how many nodes will be displayed around the node you’ve clicked. A radius of 1 will show all nodes directly linked to the node you’ve clicked, a radius of 2 will show all nodes directly linked to the node you’ve clicked as well as all nodes directly linked to those nodes, and so on. The higher the radius, the more crowded things get. The groupings do, however, tend to settle themselves into nice little discernable clumps. A drop-down menu beside the Radius setting specifies how many search results (i.e., how many connections) are shown. A setting of 10 is, in my experience, optimal.

For a look at all the ways you can customize the TouchGraph Google browser, be sure to check out the Full Instructions page at http://www.touchgraph.com/TGGB_FullInstructions.html.

Making the Most of These Visualizations

Yes, it’s cool. Yes, it’s unusual. And yes, it’s fun dragging those little nodes around. But what exactly is the TouchGraph good for?

TouchGraph does two rather useful things. First, it allows you to see at a glance the similarity relationship between large groups of URLs. You can’t do this with several flat results to similar URL queries. Second, if you do some exploring, you can sometimes get a list of companies in the same industry or area. This comes in handy when you’re researching a particular industry or topic. It’ll take time, though, so keep trying.

Check Your Spelling

Google sometimes takes the liberty of “correcting” what it perceives to be a spelling error in your query.

Most of us couldn’t communicate with the outside world without a spellchecker. As you send off an email or put the finishing touches on a document, a trusty spellchecker makes sure you haven’t made any blatant errors. Google also has a built-in spellchecker, and when Google thinks it can spell individual words or complete phrases in your search query better than you can, it suggests a “better” search, hyperlinking it directly to a query.

For example, if you search for hydrecefallus, Google will ask if you meant hydrocephalus, as shown in Figure 1-13.

Offering spelling suggestions when Google thinks it knows better

Figure 1-13. Offering spelling suggestions when Google thinks it knows better

Suggestions aside, Google assumes that you know of what you speak and returns your requested results, provided your query gleaned results.

If your query found no results for the spellings you provided and Google believes it knows better, it will automatically run a new search of its own suggestions. Thus, a search for hydrecefallus finding (hopefully) no results sparks a Google-initiated search for hydrocephalus.

Mind you, Google does not arbitrarily come up with its suggestions, but builds them based on its own database of words and phrases found while indexing the Web. If you search for nonsense like kweghgjdlsggaa, you’ll get no results and be offered no suggestions.

This is a lovely side effect and a quick and easy way to check the relative frequency of spellings. Query for a particular spelling, and note the number of results. Then click on Google’s suggested spelling and note the number of results. It’s surprising how close the counts are sometimes, indicating an oft-misspelled word or phrase.

Tip

If you find yourself turning to Google to compare spellings, you might want to automate the process of comparing phrases [Hack #26].

Embrace Misspellings

Don’t make the mistake of automatically dismissing the proffered results from a misspelled word, particularly a proper name. I’ve been a fan of cartoonist Bill Mauldin for years now, but I repeatedly misspell his name as “Bill Maudlin.” And judging from a quick Google search, I’m not the only one. There is no law stating that every page must be spellchecked before it goes online, so it’s often worth taking a look at results despite misspellings.

As an experiment, try searching for two misspelled words on a related topic, such as normotensive hydrocephalis. What kind of information did you get? Could the information you got, if any, be grouped into a particular online genre?

At the time of this writing, the search for normotensive hydrocephalis gets only three results. The content here is generally from people dealing with various neurosurgical problems. Again, there is no law that states that all web materials have to be spellchecked.

Use this to your advantage as a researcher. When you’re looking for layman accounts of illness and injury, the content you desire might actually be more often misspelled than not. On the other hand, when looking for highly technical information or references from credible sources, filtering out misspelled queries will bring you closer to the information you seek.

Spelling on the Command Line

The fact that Google gathers its spellings from across the Web instead of a dictionary means it can out-spell most email and word-processor spellcheckers. An email spellchecker won’t catch that you’ve just misspelled the name of comedian Dave Shapel (or is it Dave Chapelle?), while Google’s spellchecker will catch the error.

While this hack won’t replace your standard spellcheckers with Google, the code in this section will show you how to bring the spellchecker a bit closer to your desktop.

The code

This code contacts the Google API and asks for a spelling suggestion for the supplied word or phrase. If you’re not already accustomed to using the command line to get things done, this hack probably won’t make contacting Google any easier than opening a web browser. But for command-line junkies, it’s a quick way to tap the power of Google spelling.

Save the following code as spell.pl, and be sure to replace insert your key with your own Google API key:

#!/usr/local/bin/perl
# spell.pl
# Contact Google for spelling suggestions!
# Usage: perl spell.pl <query>
     
# Your Google API developer's key.
my $google_key='insert your key';
     
# Location of the GoogleSearch WSDL file.
my $google_wsdl = "./GoogleSearch.wsdl";
     
use strict;
     
# Use the SOAP::Lite Perl module.
use SOAP::Lite;
     
# Take the query from the command line.
my $query = join(' ',@ARGV) or die "Usage: perl spell.pl <query>\\n";
     
# Create a new SOAP::Lite instance, feeding it GoogleSearch.wsdl.
my $google_search = SOAP::Lite->service("file:$google_wsdl");
     
# Query Google.
my $results = $google_search -> 
    doSpellingSuggestion($google_key, $query);

# No results?
if ($results) {
    print $results;
}

This script is similar to any bare-bones Perl script [Hack #90] for contacting the Google API, but it uses the doSpellingSuggestion method instead of the standard search method.

Running the hack

Run the script from the command line, passing in any word or phrase you want to check, like this:

% perl spell.pl insert word or phrase
                  
               

By passing in Dave Shapel, you can see how Google suggests you spell his name:

% perl spell.pl Dave Shapel
Dave Chapelle

If you pass in a correct spelling, the script simply returns no suggestions at all.

You still need to figure out which words are questionable to use this script, but when you need to double-check a name or phrase quickly, you can think of Google as your own personal lexiconographer (or is that lexicographer?).

Google Phonebook: Let Google’s Fingers Do the Walking

Google makes an excellent phonebook, even to the extent of doing reverse lookups.

Google combines residential and business phone number information and its own excellent interface to offer a phonebook lookup that provides listings for businesses and residences in the United States. However, the search offers three different syntaxes, different levels of information provide different results, the syntaxes are finicky, and Google doesn’t provide documentation.

The Three Syntaxes

Google offers three ways to search its phonebook:

phonebook

Searches the entire Google phonebook

rphonebook

Searches residential listings only

bphonebook

Searches business listings only

Tip

The result page for phonebook: lookups lists only five results for both residential and business numbers. The more specific rphonebook: and bphonebook: searches provide up to 30 results per page. For a better chance of finding what you’re looking for, use the appropriate targeted lookup.

Using the Syntaxes

Using a standard phonebook requires knowing quite a bit of information about what you’re looking for: first name, last name, city, and state. Google’s phonebook requires no more than last name and state to get started. Casting a wide net for all the Smiths in California is as simple as:

phonebook:smith ca

Try giving 411 a whirl with that request! Figure 1-14 shows the results of the query.

Results of a phonebook: query

Figure 1-14. Results of a phonebook: query

Notice that while intuition might tell you that there are thousands of Smiths in California, the Google phonebook says that there are only 600. Just as Google’s regular search engine maxes out near 1,000 results, its phonebook maxes out at 600. Fair enough. Try narrowing your search by adding a first name, city, or both:

phonebook:john smith los angeles ca

At the time of this writing, the Google phonebook found 2 business and 20 residential listings for John Smith in Los Angeles, California.

Caveats

The phonebook syntaxes are powerful and useful, but they can be difficult to use if you don’t remember a few things about how they work.

Syntaxes are case-sensitive

Searching for phonebook:john doe ca works, while Phonebook:john doe ca (notice the capital P) doesn’t.

Wildcards don’t work

Then again, they’re not needed, since the Google phonebook does all the wildcarding for you. For example, if you want to find shops in New York with “Coffee” in the title, don’t bother trying to envision every permutation of “Coffee Shop,” “Coffee House,” and so on. Just search for bphonebook:coffee new york ny and you’ll get a list of all businesses in New York whose names contain the word “coffee.”

Exclusions don’t work

Perhaps you want to find coffee shops that aren’t Starbucks. You might think phonebook:coffee -starbucks new york ny would do the trick. After all, you’re searching for coffee and not Starbucks, right? Unfortunately not; Google thinks you’re looking for both the words “coffee” and “starbucks,” yielding just the opposite of what you were hoping for: everything Starbucks in NYC.

OR doesn’t always work

You might be wondering if Google’s phonebook accepts OR lookups. You then might experiment, trying to find all the coffee shops in Rhode Island or Hawaii: bphonebook:coffee (ri | hi). Unfortunately, that doesn’t work; the only listings you’ll get are for coffee shops in Hawaii. This is because Google doesn’t see the (ri | hi) as a state code, but rather as another element of the search.

So, if you reverse the previous search and search for coffee (hi | ri), Google would find listings that contain the word “coffee” and either the strings “hi” or “ri.” This means you’ll find Hi-Tide Coffee (in Massachusetts) and several coffee shops in Rhode Island.

It’s neater to use OR in the middle of your query and specify a state at the end. For example, if you want to find coffee shops that sell either donuts or bagels, this query works fine: bphonebook:coffee (donuts | bagels) ma. It finds stores in Massachusetts that contain the word “coffee” and either the word “donuts” or the word “bagels.” The bottom line: you can use an OR query on the store or resident name, but not on the location.

Tip

Try some phonebook lookups that you can’t do by dialing 411. For example, try searching by last name and area code, or last name and zip code! Google’s phonebook lookup is very accommodating.

Reverse Phonebook Lookup

All three phonebook syntaxes support reverse lookup, though it’s probably best to use the general phonebook: syntax to avoid not finding what you’re looking for due to a residential or business classification.

To do a reverse search, just enter the phone number with area code. Lookups without area code won’t work:

phonebook:(707) 827-7000

(This is the phone number of O’Reilly world headquarters in Sebastopol, California, USA.)

Tip

Keep in mind that Google’s phonebook service doesn’t include cell phone numbers.

Reverse lookups on Google are a hit-or-miss proposition and don’t always produce results. If you’re not having any luck, consider using a more dedicated phonebook site such as WhitePages.com (http://www.whitepages.com).

Look Up Definitions

Do you find yourself smiling knowingly when your boss mentions that well-known business principle you’ve never heard of? Overwhelmed with “geek speak”? Chances are Google’s heard it mentioned—and possibly even defined—somewhere before.

Most specialized vocabularies remain, for the most part, fairly static; words don’t suddenly change their meaning all that often. Not so with technical and computer-related jargon. It seems like every 12 seconds someone comes up with a new buzzword or term relating to computers or the Internet, and then 12 minutes later it becomes obsolete or means something completely different—often more than one thing at a time. Maybe it’s not that bad. It just feels that way.

Google can help you in two ways: by helping you look up words and by helping you figure out what words you don’t know but need to know.

Google Definitions

Before you assume you’re going to be in for a lot of Googling, try the define search syntax mentioned in the “Quick Links” section earlier in this chapter. Simply prepend the definition you’re after with the special syntax keyword define, like so:

define google juice
define julienne
define 42

Google tells you that these are defined as “power of a website to turn up in Google,” “cut food into thin sticks,” and “being two more than forty,” thanks to Wikipedia, Low Carb Luxury, and WordNet at Princeton, respectively.

Click the associated “Definition in context” link to visit the page from which the definition was drawn.

Click the “Web definitions for...” link or prefix the word you’re defining with define: (note the addition of a colon) in the first place, and you’ll net a full page of definitions drawn from all manner of places. For instance, define:TLA finds turns up oodles of definitions (all about the same, mind you), as shown in Figure 1-15.

A page chock-full of definitions for TLA

Figure 1-15. A page chock-full of definitions for TLA

Tip

The define word syntax is still subject to spelling suggestions, so you don’t have to worry too much about misspelling. The define:word form, however, doesn’t perform a web search at all, so it returns no results or spelling suggestions whatsoever if it finds no definitions to offer you.

If all that didn’t turn up anything useful, move on to Google Web Search proper.

Slang

We have distinctive speech patterns that are shaped by our educations, our families, and our location. Further, we may use another set of words based on our occupation. When a teenager says something is “phat,” that’s slanga specialized vocabulary used by a particular group. When a copywriter scribbles “stet” on an ad, that’s not slang, but it’s still specialized vocabulary or jargon used by a certain group—in this case, the advertising industry.

Being aware of these specialty words can make all the difference when it comes to searching. Adding specialized words to your search query—whether slang or industry jargon—can really change the slant of your search results.

Slang gives you one more way to break up your search engine results into geographically distinct areas. There’s some geographical blurriness when you use slang to narrow your search engine results, but it’s amazing how well it works. For example, search Google for football. Now search for football bloke. Totally different result sets, aren’t they? Search for football bloke bonce. Now you’re into soccer narratives.

Of course, this is not to say that everyone in England automatically uses the word “bloke” any more than everyone in the southern U.S. automatically uses the word “y’all.” But adding well-chosen bits of slang (which will take some experimentation) gives your search results a whole different tenor and may point you in unexpected directions. You can find slang from the following resources:

The Probert Encyclopedia—Slang (http://www.probertencyclopaedia.com/slang.htm)

This site is browseable by first letter or searchable by keyword. (Note that the keyword search covers the entire Probert Encyclopedia ; slang results are near the bottom.) The slang presented here is from all over the world. It’s often cross-linked, especially drug slang. As with most slang dictionaries, this site contains material that might offend.

A Dictionary of Slang (http://www.peevish.co.uk/slang/)

This site focuses on slang heard in the United Kingdom, which means slang from other places as well. It’s browseable by letter or via a search engine. Words from outside the UK are marked with their place of origin in brackets. Definitions also indicate typical usage: humorous, vulgar, derogatory, etc.

Surfing for Slang (http://www.spraakservice.net/slangportal)

Of course, each area in the world has its own slang. This site has a good metalist of English and Scandinavian slang resources.

Urban Dictionary (http://www.urbandictionary.com)

You can browse this collaborative dictionary by word and find dozens or hundreds of definitions for each word. The definitions are added by site visitors, and each definition is open to votes from other visitors. The most widely accepted definitions for each word bubble up to the top.

Start by searching Google for your query without the slang. Check the results and decide where they’re falling short. Are they not specific enough? Are they not located in the right geographical area? Are they not covering the right demographic—teenagers, for example?

Introduce one slang word at a time. For example, in a search for football, add the word bonce and check the results. If they’re not narrow enough, add the word bloke. Add one word at a time until you get the results you want. Using slang is an inexact science, so you have to do some experimenting.

Here are some things to be careful of when using slang in your searches:

  • Try many different slang words.

  • Don’t use slang words that are generally considered offensive, except as a last resort. Your results will be skewed.

  • Be careful when using teenage slang, which changes constantly.

  • Try searching for slang when using Google Groups. Slang crops up often in conversation.

  • Minimize your searches for slang when searching for more formal sources, such as newspaper stories.

  • Don’t use slang phrases if you can help it; in my experience, slang changes too much to be consistently searchable. Stick to established words.

Industrial Slang

Specialized vocabularies are those used in particular subject areas and industries. Good examples of specialized vocabularies are used in the medical and legal fields, although there are many others.

When you need to tip your search to the more technical, the more specialized, and the more in-depth, think of a specialized vocabulary. For example, do a Google search for heartburn. Now do a search for heartburn GERD. Now do a search for heartburn GERD gastric acid. You’ll see that each is very different.

With some fields, finding specialized-vocabulary resources is a snap. But with others, it’s not that easy. As a jumping-off point, try the Glossarist site at http://www.glossarist.com, which is a searchable subject index of about 6,000 different glossaries covering dozens of different topics. There are also several other large online resources covering certain specialized vocabularies. These resources include:

The On-Line Medical Dictionary (http://cancerweb.ncl.ac.uk/omd/)

This dictionary contains vocabulary relating to biochemistry, cell biology, chemistry, medicine, molecular biology, physics, plant biology, radiobiology, and other sciences and technologies. It currently has over 46,000 listings.

You can browse the dictionary by letter or search it by word. Sometimes you can search for a word that you know (bruise) and find another term that might be more common in medical terminology (contusion). You can also browse the dictionary by subject. Bear in mind that this dictionary is in the UK, and some spellings may be slightly different for American users (e.g., “tumour” versus “tumor”).

MedTerms.com (http://www.medterms.com)

MedTerms.com has far fewer definitions (around 15,000), but it also has extensive articles from MedicineNet. If you’re starting from absolute square one with your research and need some basic information and vocabulary to get started, search MedicineNet for your term (bruise works well) and then move to MedTerms.com to search for specific words.

Law.com’s legal dictionary (http://dictionary.law.com/lookup2.asp)

Law.com’s legal dictionary is excellent because you can search either words or definitions; you can browse, too. For example, you can search definitions for the word inheritance and get a list of all the entries that contain the word “inheritance.” This is an easy way to get to the words “muniment of title” without knowing the path.

As with slang, add specialized vocabulary slowly—one word at a time—and anticipate that your search results will be narrowed very quickly. For example, take the word “spudding,” often used in association with oil drilling. Searching for spudding by itself finds about 33,900 results on Google. Adding Texas knocks it down to 852 results, and this is still a very general search! Add specialized vocabulary very carefully, or you’ll narrow your search results to the point where you can’t find what you want.

Researching Terminology with Google

First things first: for heaven’s sake, please don’t just plug the abbreviation into the query box! For example, searching for XSLT will net you over 29 million results. While combing through the sites that Google turns up may eventually lead you to a definition, there’s simply more to life than that. Instead, add "stands +for" to the query if it’s an abbreviation or acronym. "XSLT stands +for" returns around 199,000 results, and the first is a tutorial glossary. If you’re still getting too many results ("XML stands +for" gives you around six million results), try adding beginners or newbie to the query. "XML stands +for" beginners brings in 463 results, the fourth being a general, gentle “Introduction to XML.”

If you’re still not getting the results you want, try "What is X?" or " X +is short +for" or " X beginners FAQ", where X is the acronym or term. These should be regarded as second-tier methods, because most sites don’t tend to use phrases such as “What is X?” on their pages, “X is short for” is uncommon language usage, and X might be so new (or so obscure) that it doesn’t yet have a FAQ entry. Then again, your mileage may vary, and it’s worth a shot; there’s a lot of terminology out there.

If you have hardware- or software-specific, as opposed to hardware- or software-related, terminology, try the word or phrase along with anything you might know about its usage. For example, as a Perl module, DynaLoader is software-specific terminology. That much known, simply give the two words a spin:

DynaLoader Perl

If the results are too advanced, assuming you already know what a DynaLoader is, start playing with the words beginners, newbie, and the like to bring you closer to information for beginners:

DynaLoader Perl Beginners

If you still can’t find the word in Google, there are a few possible causes: perhaps it’s slang specific to your area, your coworkers are playing with your mind, you heard it wrong (or there’s a typo on the printout you got), or it’s very, very new.

Where to Go When It’s Not on Google

Despite your best efforts, you’re not finding good explanations of the terminology on Google. There are a few other sites that might have what you’re looking for:

Whatis (http://whatis.techtarget.com)

A searchable subject index of computer terminology, from software to telecom. This is especially useful if you have a hardware- or software-specific word because the definitions are divided into categories. You can also browse alphabetically. Annotations are good and are often cross-indexed.

Webopedia (http://www.pcwebopaedia.com)

Searchable by keyword or browsable by category. This site also has a list of the newest entries on the front page so that you can check for new words.

Netlingo (http://www.netlingo.com)

This site is more Internet-oriented. It shows up with a frame on the left that contains the words, with the definitions on the right. It includes lots of cross-referencing and really old slang.

Tech Encyclopedia (http://www.techweb.com/encyclopedia/)

Features definitions and information for over 20,000 words. The top 10 terms searched for are listed so you can see if everyone else is as confused as you are. Though entries had before-the-listing and after-the-listing lists of words, I saw only moderate cross-referencing.

Wikipedia (http://www.wikipedia.com)

This public encyclopedia that anyone can edit is surprisingly accurate and up to date with technology slang. Because new entries don’t need to be approved by one or two editors, and because the work of editing is done by thousands of volunteers across disciplines and industries, Wikipedia is constantly evolving with the times.

Geek terminology proliferates almost as quickly as web pages. Don’t worry too much about deliberately keeping up; it’s just about impossible. Instead, use Google as a “ready reference” resource for definitions.

Find Directories of Information

Use Google to find directories, link lists, and other collections of information.

Sometimes you’re more interested in large information collections than scouring for specific bits and bobs. You could always take a stroll through the Google Directory (http://directory.google.com) to see what’s available, but sometimes a topic-specific directory is what you need.

Using Google, there are a couple of different ways to find directories, link lists, and other information collections from across the Web. The first uses Google’s full-word wildcards [“Full-Word Wildcards” earlier in this chapter] and the intitle: syntax [“Special Syntax” earlier in this chapter]. The second is a judicious use of particular keywords.

Title Tags and Wildcards

Pick something you’d like to find collections of information about. We’ll use “trees” as our example. The first thing we look for is any page with the words “directory” and “trees” in its title. In fact, we build in a little buffering for words that might appear between the two using a couple of full-word wildcards (* characters). The resultant query looks something like this:

intitle:"directory * * trees"

This query finds “directories of evergreen trees,” “South African trees,” and of course “directories containing simply trees.”

What if you want to take things up a notch, taxonomically speaking, and find directories of botanical information? Use a combination of intitle: and keywords, like so:

botany intitle:"directory of"

and you get almost 10,000 results. Changing the tenor of the information might be a matter of restricting results to those coming from academic institutions. Appending an edu site specification brings you to:

botany intitle:"directory of" site:edu

This gets you around 150 results, a mixture of resource directories, and, unsurprisingly, directories of university professors.

Mixing these syntaxes works rather well when searching for something that might also be an offline print resource. For example:

cars intitle:"encyclopedia of"

This query pulls in results from Amazon.com and other sites that sell car encyclopedias. Filter out some of the more obvious book finds by tweaking the query slightly:

cars intitle:"encyclopedia of" -site:amazon.com
-inurl:book -inurl:products

The query specifies that search results should not come from Amazon.com and should not have the word “products” or “book” in the URL, which eliminates a fair amount of online stores. For some interesting finds, play with this query by changing the word “cars” to whatever you like.

Tip

Of course, there are many sites that sell books online, but when it comes to injecting “noise” into results when you’re trying to find online resources and research-oriented information, Amazon.com is the biggest offender. If you’re actually looking for books, try +site:amazon.com instead.

If mixing syntaxes doesn’t find the resources you want, there are some clever keyword combinations that might just do the trick.

Finding Searchable Subject Indexes with Google

There are a few major searchable subject indexes and myriad minor ones that deal with a particular topic or idea. You can find the smaller subject indexes by customizing a few generic searches. "what's new" "what's cool" directory, while gleaning a few false results, is a great way to find searchable subject indexes.

directory "gossamer threads" new is an interesting one. Gossamer Threads is the creator of a popular link directory program. This is a good way to find searchable subject indexes without too many false hits.

directory "what's new" categories cool doesn’t work particularly well, because the word “directory” is not a very reliable search term, but you will pull in some things with this query that you might otherwise have missed.

Let’s put a few of these into practice:

"what's new" "what's cool" directory phylum
"what's new" "what's cool" directory carburetor
"what's new" "what's cool" directory "investigative journalism"
"what's new" directory categories gardening
directory "gossamer threads" new sailboats
directory "what's new" categories cool "basset hounds"

The real trick is to use a more general word, but make it unique enough that it applies mostly to your topic and not to many other topics.

Take acupuncture, for instance. Start narrowing it down by topic. What kind of acupuncture? For people or animals? If for people, what kinds of conditions are being treated? If for animals, what kinds of animals? Maybe you should search for "cat acupuncture", or maybe you should search for acupuncture arthritis. If this first round doesn’t narrow the search results enough, keep going. Are you looking for education or treatment? You can skew results one way or the other using the site: syntax. So maybe you want "cat acupuncture" site:com or arthritis acupuncture site:edu. By taking just a few steps to narrow things down, you can get a reasonable number of search results focused around your topic.

Cover Your Bases

Try all possible combinations of your search keywords at once, and find related keywords with Google Sets.

Imagine you have a set of query words but are not sure that they’re the right set; you certainly don’t want to miss any results by picking the wrong combination of keywords, including or excluding the wrong word. But the thought of typing a dozen-plus permutations of keywords has your carpal tunnel flaring up in horror. With some existing tools, you can fine-tune your Google queries by playing with word sets—leading you down paths you might not have discovered.

Search Grid (http://blog.outer-court.com/search-grid), by German programmer Philipp Lenssen, lets you explore a wide range of Google search results by automatically searching for multiple combinations of keywords you specify. This gives you a quick overview of paths you can follow for a given set of keywords. You might, for example, put catsup, mustard, and pickles on the x-axis and relish, onions, and tomatoes on the y-axis, as shown in Figure 1-16.

Search Grid populated with keywords to combine

Figure 1-16. Search Grid populated with keywords to combine

Search Grid combines the results—relish catsup, relish mustard, relish pickles, onions catsup, onions mustard, onions pickles, etc.—and provides you with the first result of each possible combination, shown in Figure 1-17.

The first of several different searches, all in one grid

Figure 1-17. The first of several different searches, all in one grid

Note that you get nothing but the first result; this is not the tool to use if you want an in-depth search of each query. Instead, it’s meant to give you a bird’s-eye view of how the different combinations of search words impact the query.

There’s also a version of Search Grid that’s been integrated into a web tool called FindForward (http://www.findforward.com/?t=grid), which gives you screenshots of some Google search results. FindFoward requires less typing: enter two to five words for which you want to check possible permutations. You get a large grid of search results, with screenshots available for some of the pages, as shown in Figure 1-18.

Search results for keyword combinations — with screenshots!

Figure 1-18. Search results for keyword combinations — with screenshots!

Note that this grid searches each of your keywords individually (one square for mustard, one for pickles, one for relish) and searches every possible combination of two words (pickles relish, pickles mustard, mustard relish, etc.), but it doesn’t search for three- and four-word permutations. In other words, this tool doesn’t find every last possible permutation of your search. Again, it’s an overview that gives you an idea of how different word combinations can affect your search, and it is not meant to be exhaustive.

Buy why limit yourself to keyword sets that you can dream up? Google has its own tool in development to expand your keyword vocabulary based on a small set of words. Google Sets (http://labs.google.com/sets) allows you to enter several keywords and have Google predict similar keywords in a large or small set. For example, plug catsup, mustard, and pickles into the form and click Large Set. You should see a list of 25 or more words that run the condiment gamut from Lettuce to Black Olives, as shown in Figure 1-19.

Google Sets predictions based on a few keywords

Figure 1-19. Google Sets predictions based on a few keywords

You can click any of the words in the set to see a standard Google Search with that word. You can also click the Shrink Set to get a list of fewer (but potentially more accurate) items based on your original keywords. Google Sets can be handy if you want to expand your search possibilities but aren’t sure which direction to go. You can even take the keyword suggestions from Google Sets back to the grid tools to see how using them in combination will affect your results.

Use the tools in this hack when you want to get a sense of how different queries will affect your search, when you’re not sure about what set of search words will return the results you’re looking for, and when you want to experiment with expanding your search without having to type several sets of keywords over and over again.

Hack Your Own Google Search Form

Build your own personal, task-specific Google search form.

If you want to do a simple search with Google, you need only the standard Simple Search form (the Google home page). But if you want to craft specific Google searches to use on a regular basis or provide for others, you can simply put together your own personalized search form.

Start with a garden-variety Google search form; something like this will do nicely:

<!-- Search Google -->
<form method="get" action="http://www.google.com/search">
<input type="text" name="q" size=31 maxlength=255 value="">
<input type="submit" name="sa" value="Search Google">
</form>
<!-- Search Google -->

This is a very simple search form. It takes your query and sends it directly to Google, adding nothing to it. But you can embed some variables to alter your search as needed. You can do this in two ways: via hidden variables or by adding more input to your form.

Hidden Variables

As long as you know how to identify a search option in Google, you can add it to your search form via a hidden variable. The fact it’s hidden just means that form users can’t alter it. They can’t even see it unless they look at the source code. Let’s look at a few examples.

Tip

While it’s perfectly legal HTML to put your hidden variables anywhere between the opening and closing <form> tags, it’s rather tidy and useful to keep them together after all the visible form fields.

File Type

As the name suggests, File Type specifies that your results are filtered by a particular file type (e.g., Word .doc, Adobe .pdf, PowerPoint .ppt, plain text .txt). Add a PowerPoint file type filter, for example, to your search form, like so:

<input type="hidden" name="as_filetype" value="PPT">
Site Search

Narrows your search to specific sites. While a suffix such as .com will work just fine, something more fine-grained such as the example.com domain is probably better suited:

<input type="hidden" name="as_sitesearch" value="example.com">
URL Component

Specifies a particular path component to look for in URLs. This can include a domain name but doesn’t have to. The following tries to tease out documentation in your result set:

<input type="hidden" name="hq" value="inurl:docs">
Date Range

Narrows your search to pages indexed within the stated number of months. Acceptable values are between 1 and 12. Restricting your results to items indexed only within the last seven months is just a matter of adding:

<input type="hidden" name="as_qdr" value="m7">
Number of Results

Indicates the number of results you’d like to appear on each page, specified as a value of num between 1 and 100; the following asks for 50 per page:

<input type="hidden" name="num" value="50">

What would you use this for? If you regularly look for an easy way to create a search engine that finds certain file types in a certain place, this works really well. If this is a one-time search, you can always just hack the results URL (see “Understanding Google URLs” earlier in this chapter), tacking the variables and their associated values to the URL of the results page.

Mixing Hidden File Types: an Example

The O’Reilly web site (http://www.oreilly.com) contains hundreds of chapter previews from O’Reilly books in Adobe PDF format. If you want to find just the PDF files on the site, you must figure out how the site’s search engine works or pester O’Reilly to add a file type search option. But you can put together your own search form that finds PDF files with the matching search terms on the oreilly.com site and read some free chapters from O’Reilly books in the process.

Tip

Even though you’re creating a handy search form, you’re still resting on the assumption that Google’s indexed most or all of the site you’re searching. Until you know otherwise, assume that any search results Google gives you are incomplete.

Your form looks something like this:

<!-- Search oreilly.com for PDFs -->
<form method="get" action="http://www.google.com/search">
<input type="text" name="q" size=31 maxlength=255 value="">
<input type="submit" name="sa" value="Search Google">
<input type="hidden" name="as_filetype" value="pdf">
               <input type="hidden" name="as_sitesearch" value="oreilly.com">
               <input type="hidden" name="num" value="100">
</form>
<!-- Search oreilly.com for PDFs -->

Using hidden variables is handy when you want to search for one particular thing all the time. But if you want to be flexible in what you’re searching for, creating an alternate form is the way to go.

Creating Your Own Google Form

Some variables work well hidden; however, for other options, you can give your form users visible options to provide more flexibility.

Let’s go back to the previous example. You want to let your users search for PDF files, but you also want them to be able to search for Excel and Microsoft Word files. In addition, you want them to be able to search not only oreilly.com, but also the State of California or the Library of Congress web sites. Obviously, there are various ways to design this form; this example uses a couple of simple pull-down menus.

<!-- Custom Google Search Form-->
<form method="get" action="http://www.google.com/search">
<input type="text" name="q" size=31 maxlength=255 value=""><br />
               Search for file type: 
               <select name="as_filetype">
                  <option value="ppt">PowerPoint</option>
                  <option value="xls">Excel</option>
                  <option value="doc">Word</option>
               </select><br />
               Search site:
               <select name="as_sitesearch">
                  <option value="oreilly.com">oreilly.com</option>
                  <option value="state.ca.us">State of California</option>
                  <option value="loc.gov">The Library of Congress</option>
               </select>
<input type="hidden" name="num" value="100">
<input type="submit" value="Search Google">
</form>
<!-- Custom Google Search Form-->

FaganFinder (http://www.faganfinder.com/engines/google.shtml) is a wonderful example of a thoroughly customized form.

If you find yourself running fairly complex queries on a regular basis, you can speed things up by setting a few options in a custom form. And chances are good that if you find the convenience of a custom form helpful, others will too. So, making your custom form available on your web site is a good way to let others share in your productivity.

Compare Google and Yahoo! Search Results

Pit Google and Yahoo! against each other and find more search results in the process.

If you’ve ever searched for the same phrase at both Google and Yahoo!, you’ve probably noticed that the results can be surprisingly different. That’s because Google and Yahoo! have different ways of determining which sites are relevant for a particular phrase. Though both companies keep the exact way of how they determine the rank of results a secret—to thwart people who would take advantage of it—both Yahoo! and Google provide some clues about what goes into their ranking system.

At the heart of Google’s ranking system is a proprietary method it calls PageRank, and Google doesn’t give detailed information about it. But Google does say this:

Google’s order of results is automatically determined by more than 100 factors, including our PageRank algorithm.

Here’s the official word from Yahoo!:

Yahoo! Search ranks results according to their relevance to a particular query by analyzing the web page text, title, and description accuracy as well as its source, associated links, and other unique document characteristics.

Though we might never know exactly why results are different between the two search engines, at least we can have some fun spotting the differences—and end up with more search results than either one of the sites would have offered on their own.

One way to compare results is to simply open each site in separate browser windows and manually scan for differences. If you search for your favorite dog breed—say, "australian shepherd"—you’ll find that the top few sites are the same across both Yahoo! and Google, but the two search engines quickly diverge into different results. At the time of this writing, both sites estimate exactly 1,030,000 total results for this particular query, but estimated result counts might be a way to spot differences between the sites.

Viewing both sets of results in different windows is a bit tedious, and a clever Norwegian developer named Asgeir S. Nilsen has made the task easier, at a site called Twingine.

Twingine

The Twingine site (http://twingine.com) contains a blank search form into which you can type any search query. When you click Search, the site brings up the results pages for that query from both Yahoo! and Google, side by side. To be fair, the sides on which Google and Yahoo! appear change at random, so people who prefer one side of the screen to the other won’t be biased. Plugging "australian shepherd" into Twingine yields a page such as the one shown in Figure 1-20.

Google and Yahoo! going head to head at Twingine

Figure 1-20. Google and Yahoo! going head to head at Twingine

Clicking Next or Previous in the top frame at Twingine takes you to the next or previous page in the search results at both sites.

Surfing the pages in the search results at Twingine can be a bit tricky. You’ll probably want to open linked search results in a new window or tab, so that you can keep your place in the search results at both Yahoo! and Google. You can open links in a new window by right-clicking the link (Ctrl-click on a Mac) and choosing Open Link in New Window from the menu. You can also set your search preference at either search engine to automatically open links in a new window when you click a search result.

Yahoo! Versus Google Diagram

Another site, developed by Christian Langreiter, adds a bit of analysis to the different sets of search results between Yahoo! and Google. If you have Flash installed, you can type a search query into the form at http://www.langreiter.com/exec/yahoo-vs-google.html, and the site fetches the search results from both engines in the background using their open APIs. The site delivers the results in a chart, as shown in Figure 1-21.

Mapping the differences between Yahoo! and Google results

Figure 1-21. Mapping the differences between Yahoo! and Google results

Each blue or white dot in the diagram represents a search result URL, and the position of the dot represents the ranking. The dots on the far left are the top search results, and the further right you go, the further down you go in the search results. The blue lines represent the same URL, so you can see exactly where Google and Yahoo! line up.

In Figure 1-21, you can see that the top search result for "australian shepherd" is the same URL, but the lines aren’t as evenly matched further down in the results. As you hover over each dot, you see the URL, which you can click to visit that particular search result.

The white dots in the diagram represent a URL that one search has in the results that the other does not. And as this diagram demonstrates, neither search engine has a monopoly on matching pages, nor does each engine’s index have every page on a particular topic.

Piling Results Together

If you want to compare even more results than the big two provide, a service called Dogpile (http://www.dogpile.com) will gather responses from six different search engines into a single page. As shown in Figure 1-22, each individual match for the query lists the search engines where that result was found.

Comparing Google and Yahoo! results at Dogpile.com

Figure 1-22. Comparing Google and Yahoo! results at Dogpile.com

By clicking the search engine buttons at the top of the page, you can directly compare the top 12 results from Google, Yahoo!, and other search engines. Any listing unique to a particular search engine is highlighted in yellow—so you can see at a glance what you’d be missing by using either Google or Yahoo! alone.

While the individual search results in the main column show the “Best of All Search Engines,” be aware that some of the individual results are from advertising on search engines—not simply the most organic search results. Each listing indicates which search engines it came from, and ads are clearly labeled.

If you already do serious research with search engines, you’re very aware that having several search tools at your disposal is better than relying on one. And with the methods mentioned in this hack, you can compare and contrast the tools, giving you more results to choose from.

Cover Your Tracks

By understanding how your browser stores information related to your Google searches, you can be sure that your searches are your own.

Most of us think of our Google searches as something private, an exchange between one individual and Google. But if you share a computer with others, your searches might not be as private as you think. Whether you’re searching for a surprise birthday gift, a private medical concern, legal advice, or “researching” some risqu\x8e topic, there are times when your browser’s memory can come back to haunt you.

By default—in an effort to help your memory—your computer remembers your past Google searches and stores them so you can access them later. There are several ways your computer accomplishes this, and you should be aware of each of them if you want to cover your tracks completely.

Browser History

The first and most obvious place that your browser stores your past searches is in your browser history. You can quickly view your current browser history in Firefox or Internet Explorer by typing Ctrl-H (Command-Shift-H on a Mac). A new pane will open that includes all of the sites you’ve visited recently, along with the specific pages at those sites, as shown in Figure 1-23.

Browser history pane in Firefox

Figure 1-23. Browser history pane in Firefox

From the pane on the left, you can easily revisit sites. Open the google.com folder to see recent searches, and note that other Google searches, such as Google images, are stored in its own folder, images.google.com. If you see a search you’d rather not share with others, you can simply highlight that particular entry, right-click, and click Delete on the menu.

Also be aware that your browser history is exposed through your address bar. As you start typing a URL into the address bar, the browser tries to guess where you want to go by offering matching URLs in your search history. If you start typing http://www.g, you’ll find a list of recent Google searches, as shown in Figure 1-24.

Address history in Internet Explorer

Figure 1-24. Address history in Internet Explorer

By studying the URL, you can see what search term was used, and can highlight the entry to visit that page of search results. In Firefox, you can delete any entry by highlighting it and typing Shift-Delete. Internet Explorer users can only selectively delete from the History pane.

If you want to completely remove your browser history, there’s a faster way than deleting each entry one at a time. Here are the steps for purging your history:

Internet Explorer

Choose Tools→Internet Options from the top menu. Look for the History section on the General tab and click Clear History. You can also adjust the number of days you’d like to keep pages in your browser history; set this to 0 to disable your history completely. Click OK, and your browser history will be gone.

Firefox

Choose Tools→Options (Firefox→Preferences on a Mac) and click Privacy on the top menu. Choose the History tab and click Clear Browsing History Now. You can also set the number of days you’d like to keep pages here—with 0 disabling the feature.

Opera

Opera stores typed-in addresses and visited pages in two distinct places, so you want to be sure to clear both. Choose Tools→Preferences (Opera→Preferences on a Mac) from the top menu, click the Advanced tab, and then click History from the menu. Click Clear next to Typed in Addresses and Visited Addresses (only Addresses on a Mac). You can also use this opportunity to set the number of entries you’d like Opera to remember—up to 500 typed-in addresses and up to 10,000 visited addresses. Set this to 0 to disable your history.

Safari users on Mac OS X can manage their browser history through the History menu option shown in Figure 1-25.

The History menu in Safari

Figure 1-25. The History menu in Safari

Unfortunately, you can’t selectively delete entries from your Safari browsing history, but you can click Clear History to remove all of your past browsing.

Safari users on Mac OS X can take advantage of a feature called Private Browsing. With Private Browsing enabled, sites aren’t added to the browser history and form data isn’t saved. You can use the Private Browsing mode at all times to effectively disable your browser history.

Saved Form Data

Another place where your past Google searches can be found is in saved form data. Having this data available is a convenience, because you can type a single letter into the Google search form and get a list of your past searches that start with that letter, as shown in Figure 1-26.

Saved form data in Firefox

Figure 1-26. Saved form data in Firefox

Instead of retyping complex queries that you put together in the past, you can simply choose your past query, click or type Enter, and the search is re-run. But if you’d rather not share these past queries with others on your computer, you need to delete them.

You can selectively delete entries from this menu in both Firefox and Internet Explorer by highlighting an entry and typing Shift-Delete.

Here are the steps to delete all your saved form data in one go:

Internet Explorer

Choose Tools→Internet Options from the top menu and then click the Content tab. Click AutoComplete and then the Clear Forms button. Uncheck the box next to Forms and click OK to disable AutoComplete.

Firefox

Choose Tools→Options (Firefox→Preferences on a Mac) from the top menu, click Privacy, and choose the Saved Forms tag. Click the button labeled Clear Saved Form Data Now to remove your past form entries. You can also take the opportunity to uncheck the box next to “Save information” to disable the feature.

Safari

Choose Safari→Preferences from the top menu and choose AutoFill. Click Edit... next to “Other forms” and highlight google.com (and any others you’d like to remove) on the list. Click Remove, and your saved form data is deleted. You can also uncheck the box next to “Other forms” to disable the AutoFill feature.

Tip

At the time of this writing, Opera has a feature called Wand that saves usernames and passwords, but the browser doesn’t save form data as the other browsers do.

Even with your browser history and saved form data gone, there are still ways for persistent snoops to find your Google searches.

Browser Cache

All browsers use a cache to store recently accessed web pages and images. With a local copy of the files on your computer, the browser can display pages much faster if you visit the site again in the future. The cache also leaves a trail of your surfing history, including Google searches.

Figure 1-27 shows the Temporary Internet Files folder where Internet Explorer stores cached items.

Viewing the Internet Explorer cache

Figure 1-27. Viewing the Internet Explorer cache

As you can see, the Internet Address is in plain view, along with the search queries used. The first trick to removing items from your cache is finding the cache folder. Typically, these are buried deep in your filesystem and given cryptic names because they’re not intended to be accessed by humans. Luckily, they’re easy to browse if you know how to get there:

Internet Explorer

To find your cache, choose Tools→Options from the menu and click Settings... under Temporary Internet Files. Click View Files... to bring up the files in an Explorer window. From there, you can selectively delete any files in your cache, including Google pages.

Firefox

To view your cache, type about:cache in the address bar and click List Cache Entries to see the files in your cache. Though you can’t selectively delete through this page, the cache directory path is listed at the top. From there, you can browse to the files with Windows Explorer (or the Finder on a Mac).

Opera

To find your cache directory, choose Help→About Opera from the top menu. Your cache directory is listed on the page, and from there you can delete the past Google results pages you’ve visited.

Safari

In Finder, browse to your Safari cache folder, ~/Library/Caches/Safari/, and selectively delete Google pages.

While you might not need to go to this extreme to remove your past Google searches, knowing where the information is located gives you the choice. And even going through these steps is no guarantee that the information is gone. In the hands of a hard-disk forensics expert, even deleted information can often be recovered.

A complete privacy strategy is beyond the scope of this book, but you can turn to Computer Privacy Annoyances by Dan Tynan (O’Reilly) for even more information about keeping your personal information private.

Improve Google’s Memory

With a feature called Search History, Google stores the searches you’ve made and the links you’ve followed so you can go back to them in the future.

Google is an impressive organizer of information, but it’s not very personable. Google is very much the same for me as it is for you. In fact, if you search Google with the word personable, you’ll see the same results I do. However, Google is working on technology that will tailor its search results to you as an individual. One step in that direction is the Search History (a.k.a. Personalized Search) feature, in beta testing at the time of this writing.

You’ve probably already experienced how Google’s memory can help you recall a search you did in the past. As you type letters into the main Google Search form, your browser tries to complete your thought, recalling past searches. This limited form of memory [Hack #11] can be handy, but it’s not terribly accurate or organized. For one, you can’t tell your browser which searches were successful and which weren’t. You can’t highlight favorite results or organize them in any way.

If you turn on Google’s memory through the Search History feature, you can let Google do the work of remembering how you use the site. In addition, you have access to your search history, no matter how you access the Web, because your history is stored at Google instead of your local computer.

The best way to get to know how Search History works is to try it out. You need a Google Account to use Search History; if you have a Gmail Account, you’re ready to go. If you don’t have a Google Account yet, browse to https://www.google.com/accounts/NewAccount and sign up. Google offers the option to disable Personalized Search when you create an account, as shown in Figure 1-28, but if you’re there to try out Search History, you need to leave this unchecked.

Google Account sign-up page

Figure 1-28. Google Account sign-up page

Once you have an account, browse to the Google home page, click Sign In (if you’re not signed in already), then click My Account at the top of the page. From your account page, click the Personalized Search link under Try New Services. From there, your Search History is activated.

Once your account is activated, you can use Google as you normally would and have access to the following list of features.

Searches and documents clicked

Search History will remember every search you make at Google and every site you click to visit—recording the time and date of each click.

Searches without clicked results

Even searches in which you don’t click any results will be stored for later reference.

Bookmarks

From your Search History page, you can highlight links you’ve clicked to save as bookmarks. You can give each bookmark a number of labels so you can find it later. For example, you might give a bookmark to the O’Reilly Hacks site (http://hacks.oreilly.com) the labels books, O’Reilly, and geek to help you find the bookmark later. You can also add your own notes to each bookmark.

Search Trends

Once you’ve used Search History for a while, Google will help you visualize your searching activity by spotting trends and tabulating the amount of searches per day.

Personalized Results

As Google gets to know you, it will refine the standard Google Search results page to match your past searching activity. Links you’ve bookmarked will show their labels in searches, and you can block some sites from showing up in search results.

Searchable History

Your history itself is searchable. So if you want to get back to that geeky book site you know you once found through Google, you can try a search on your unique history to find the site.

Tip

At the time of this writing, Search History is supported across Web, Images, News, and Froogle searches. Other Google searches—such as Groups, Local, or Book searches—will not appear in your history.

To access the features, click Search History at the top of the Google main page. You’ll see a list of your recent searches organized by date, as shown in Figure 1-29.

Google Search History page

Figure 1-29. Google Search History page

Each bold entry in the center column is a search you performed at Google, with links clicked from that search directly below. Use the links to the left to filter the list to searches performed at Google properties. For example, check the Images box to see your recent Google Image searches, as shown in Figure 1-30.

Google Images Search History

Figure 1-30. Google Images Search History

If there’s a link or image that you want to save more permanently for future reference, click the star next to that item’s listing. This saves the item in your Bookmarks. In addition, if there’s an item you want Google to forget, click the Remove items link on the left side of the page, choose the item, and click Remove.

You can view any item you’ve bookmarked by clicking the Bookmarks link on the left side of the page. From there, you can edit any bookmark by adding labels or notes. As you label your bookmarks, each label appears under the Bookmarks link on the left side of the page. Clicking a specific label shows you only those bookmarks with that particular label—showing just a list of your bookmarked sites related to photography, for example.

There’s currently no way to share your history or bookmarks with others, and your search history is as private as your Google Account password. So, while it might feel odd to save all your Search History for review, it’s a step toward your own personal search engines with results just for you. And if you ever want to part ways with your Search History, click the My Account link and choose Delete Personalized Search from the menu. Your history will be history.

Find Out What Google Thinks ___ Is

What does Google think of you, your friends, your neighborhood, or your favorite movie?

If you’ve ever wondered what people think of your hometown, your favorite band, your favorite snack food, or even you, Googlism (http://www.googlism.com) may provide you with something useful.

The Interface

The interface is dirt simple. Enter your query and check the appropriate radio button to specify whether you’re looking for a who, a what, a where, or a when. Figure 1-31 shows a representative results page for Sherlock Holmes, famous fictional detective. You can also use the tabs to see what other objects people are searching for and what searches are the most popular.

Googlism results for Sherlock Holmes

Figure 1-31. Googlism results for Sherlock Holmes

Warning

Some of the results you find are not safe for work.

What You Get Back

Googlism responds with a list of things Google believes about the query at hand, be it a person, place, thing, or moment in time. For example, a search for Perl and “What” returns, along with a laundry list of others:

Perl is y2k compliant
Perl is not my favourite programming language
Perl is the coder's language of choice
Perl is the language of love

These are among the more humorous results for Steve Jobs and “Who”:

steve jobs is my new idol
steve jobs is at it again
steve jobs is trying to kill me

To figure out what page any particular statement comes from, simply copy and paste it into a plain old Google search, with the complete phrase in quotes. That last statement, for instance, came from a 2002 blog post about iMacs at http://www.fismo.com/KeepUp/fog0000000025.html.

Practical Uses

For the most part, this is a party hack—a good party hack. It’s a fun way to aggregate related statements into a silly (and occasionally profound) list.

But that’s just for the most part. Googlism also works as a handy ready-reference application, allowing you to quickly find answers to simple or simply asked questions. Just ask a question of Googlism in a way that can end with the word “is.” For example, to discover the capital of Virginia, enter The capital of Virginia. To learn why the sky is blue, try The reason the sky is blue.

Sometimes, this doesn’t work very well; try the oldest person in the world, and you’re immediately confronted with a variety of contradictory information. You’d have to visit each page represented by a result and see which answer, if any, best suits your research needs.

Expanding the Application

This application is a lot of fun, but it can be expanded. The trick is to determine how web page creators generate statements.

For example, when initially describing an acronym, many writers use the words "stands for". So you can add a Googlism that searches for your keyword and the phrase “stands for.” Do a Google search for "SETI stands for" and "DDR stands for" and you’ll see what I mean.

When referring to animals, plants, and even stones, the phrase “are found” is often used, so you can add a Googlism that locates things. Do a Google search for sapphires are found and jaguars are found and see what you find.

See if you can think of any phrases that are in common usage, and then check those phrases in Google to see how many results each phrase has. You might get some ideas for a topic-specific Googlism tool yourself.

Browse the World Wide Photo Album

Take a random stroll through the world’s photo album using some clever Google Image searches (and, optionally, a smidge of programming know-how).

The proliferation of digital cameras and the growing popularity of camera phones are turning the Web into a worldwide photo album. It’s not only the holiday snaps of your Aunt Minnie or the minutiae of your moblogging friend’s day that are available to you. You can actually take a stroll through the publicly accessible albums of perfect strangers if you know where to look. Happily, Google has copies, and a couple of hacks know just where to look.

Random Personal Picture Finder

Digital photo files have relatively standard filenames (e.g., DSC01018.JPG) by default and are usually uploaded to the Web without being renamed. The Random Personal Picture Finder (http://www.diddly.com/random) sports a clever little snippet of JavaScript code that simply generates one of these filenames at random and queries Google Images for it.

The result, shown in Figure 1-32, is something like looking through the world’s photo album: people eating, working, posing, and snapping photos of their cats, furniture, or toes. And since it’s a normal Google Images search, you can click on any photo to see the story behind it, and the other photos nearby.

Neat, huh?

The Random Personal Picture Finder

Figure 1-32. The Random Personal Picture Finder

Warning

Note that people snap pictures of not just their toes (or the toes of others). While an informal series of Shift-Reloads in my browser turned up only a couple of questionable bits of photographic work, you should assume the results are not workplace- or child-safe.

The code behind the scenes, as I mentioned, is really very simple: a swatch of JavaScript (view the source of http://www.diddly.com/random/random.html in your browser to see the JavaScript bits for yourself) and list of camera types and their respective filename structures (http://www.diddly.com/random/about.html). You’re simply redirected to Google Images with generated search query in tow.

A smidge of Python illustrates just how simple it is to generate a link to a random collection of photos shot with a Canon digital camera:

$ python
ActivePython 2.4 Build 244 (ActiveState Corp.) based on
Python 2.4 (#60, Feb  9 2005, 19:03:27) [MSC v.1310 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from random import randint
>>> linkform = 'http://images.google.com/images?q=IMG_%s.jpg'
>>> print linkform % str(randint(1, 9999)).zfill(4)
http://images.google.com/images?q=IMG_3275.jpg

You can easily use this as the basis of a CGI script that acts in the same manner as the Random Personal Picture Finder.

Searching Personal Sites

In addition to finding personal photos based on common filenames, you can also use Google Images to search sites that host personal photos. The collision of digital photography and blogs [Hack #41] means millions are posting their snapshots along with their posts. If you limit your image searches to common blog domains, you’ll find thousands of personal photographs.

For example, it’s common for people to post to their blog when they get a new car and include a picture for their friends, family, and complete strangers to look at. If you want to find some personal pictures of cars, browse to Google Images (http://images.google.com) and try the following query using the site: keyword:

"new car" site:blogger.com

You’ll find pictures that have been posted to Blogger’s image-hosting service. You can often click the photo, find the post, and read an entire story behind a particular photo.

If you want to search across several services at once, you can combine queries. Say you want to search for photos of cars across both Blogger and competitor TypePad. Try the following query:

"new car" site:typepad.com OR site:blogger.com

There are hundreds of sites that host personal photos; all you need to do is find the domains. Here are a few to get you started: xanga.com, geocities.com, textamerica.com, flickr.com, and smugmug.com. To find more, take a look at the Photo Sharing category in the Google Directory:

http://www.google.com/Top/Computers/Internet/On_the_Web/Web_Applications/Photo_Sharing/

The Web has become our global photo album. And while browsing through the millions of personal photos available can verge on voyeuristic, it’s a reminder that we all love to take, share, and look at photographs.

Paul Bausch and Aaron Swartz

Find Similar Images

Explore the Web in a new way by finding other images of the same name.

I will be the first to admit that this hack has no practical purpose. I originally conceived it in an IRC channel, when someone posted a link to http://images.google.com/images?q=P5170003. That particular keyword is a filename used by a particular brand of digital camera. Some cameras generate filenames based on the date the photo was taken and a unique identifier within the camera; others simply use an incrementing identifier starting with 1. Many people take digital images and then simply publish them online, without giving the photo a more meaningful filename. The end result is that you can use Google Images to find a random selection of images published by different people. (This particular query finds photos taken on May 17, my wedding anniversary.)

Tip

This hack relies on the Greasemonkey Plugin (http://greasemonkey.mozdev.org/) for the Firefox web browser (http://www.mozilla.com/firefox/).

Anyway, this hack converts all unlinked images into links to Google Images to find other random images with the same filename. If that sounds silly, that’s because it is. It’s also surprisingly fun, if you like that sort of thing.

The Code

This user script runs on all pages. It uses the document.images collection to find all the images on the page and wraps each of them in a link to http://images.google.com/images?q= plus the image filename. Firefox seriously dislikes replacing an element with another element that contains the original element, so we use the cloneNode method to make a copy of the original <img> element, put it in an <a> element, and then replace the original <img>.

Save the following user script as similarimages.user.js:

// ==UserScript==
// @name          Find Similar Images
// @namespace     http://diveintomark.org/projects/greasemonkey/
// @description   links images to find similar images on Google Image Search
// @include       http://*
// @exclude       http://*.google.tld/*
// ==/UserScript==

for (var i = document.images.length - 1; i >= 0; i--) {
    var elmImage = document.images[i];
    var usFilename = elmImage.src.split('/').pop();
    var elmLink = elmImage.parentNode;
    if (elmLink.nodeName != 'A') {
        var elmLink = document.createElement('a');
        elmLink.href = 'http://images.google.com/images?q=' +
            escape(usFilename);
        elmLink.title = 'Find images named ' + usFilename;        
        var elmNewImage = elmImage.cloneNode(false);
        elmLink.appendChild(elmNewImage);
        elmImage.parentNode.replaceChild(elmLink, elmImage);
    }
}

Running the Hack

After installing the user script (Tools→Install This User Script), visit http://randomness.org.uk/photos/index.cgi/months/may_2003. When you move your cursor over an image, you will see a tool tip displaying the filename of the image, as shown in Figure 1-33.

Image tool tips

Figure 1-33. Image tool tips

Each image on the page is now a link to a Google Images search for images of the same name. This can lead to some pretty random results, as shown in Figure 1-34.

Other images named P5170003

Figure 1-34. Other images named P5170003

Have fun exploring accidental cross-sections of the Web!

Mark Pilgrim

Track Stocks

A well-crafted Google query will usually net you company information beyond that provided by traditional stock services.

You can get a quick look at how a stock is performing by simply using a ticker symbol in the Google search form. For example, if you want to see how Google (the company) is faring during the day, type GOOG into Google, click Search, and you’ll find some quick data, as shown in Figure 1-35.

Google quick stock data lookup

Figure 1-35. Google quick stock data lookup

You’ll see a recent stock price, data for the day, a chart showing recent performance, and links to more information at Google Finance, Yahoo! Finance, MSN Money, and other sites that track stocks. Click the ticker symbol or the chart to go to the Google Finance page for that stock, where you can compare dips and spikes in prices with company news, find background information on the company, and take part in discussions about the stock.

Beyond Google for Basic Stock Information

If you want a second opinion about stock performance, I recommend going straight to Yahoo! Finance (http://finance.yahoo.com) to quickly look up stocks by symbol or company name. There, you’ll find all the basics: quotes, company profiles, charts, and recent news. For more in-depth coverage, I heartily recommend Hoovers (http://www.hoovers.com). Some of the information is free. For more depth, you must pay a subscription fee.

More Stock Research with Google

Try searching Google for:

"Tootsie Roll"

Now add the stock symbol, TR, to your query:

"Tootsie Roll" TR

Aha! Instantly, the search results shift to financial information. Now, add the name of the CEO:

"Tootsie Roll" TR "Melvin Gordon"

You end up with a nice, small, targeted list of results, as shown in Figure 1-36.

Using a stock symbol to limit results

Figure 1-36. Using a stock symbol to limit results

Stock symbols are great “fingerprints” for Internet research. They’re consistent, they often appear along with the company name, and they’re usually enough to narrow your search results to relevant information.

There are also several words and phrases that you can use to narrow your search for company-related information. Replacing company with the name of the company you’re looking for, try these:

  • For press releases: " company announced", " company announces", " company reported"

  • For financial information: company "quarterly report", company SEC, company financials, company "p/e ratio"

  • For location information: company parking airport location (doesn’t always work but sometimes works amazingly well)

Get Google Hacks, 3rd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.