In addition to the simple search form you’ll find at http://search.yahoo.com, Yahoo! offers an Advanced Web Search form at http://search.yahoo.com/web/advanced. This form lets you refine your search in a number of ways, so you can narrow the results to a more useful list.
For example, if you’d like to find information about a generic topic, such as astronomy, you could go to Yahoo!, type
astronomy into the search form, and find hundreds of sites related to the word. But if you want only a segment of those results, you can browse over to the Advanced Web Search form, type
astronomy, and limit the results by top-level domain, as shown in Figure 1-9.
A search for
astronomy across .gov sites returns only pages at NASA’s web site. The same search limited to .edu sites results in astronomy programs at various universities, and limiting to .com gives you astronomy magazines at the top of the results.
You can further refine your search by limiting it to a specific file format, such as PDF files, Excel spreadsheets, or XML files. For any given search, you can also override your global preferences settings for language, number of results, and adult content filtering.
To get started with hacking URLs, type a term into the Advanced Web Search form and click the Yahoo! Search button, which will take you to the results page. Once there, note the insanely long URL in the address of your browser. It will look something like this:
http://search.yahoo.com/search?_adv_prop=web&x=op&ei=UTF-8&va=astronomy&va_ vt=any&vp_vt=any&vo_vt=any&ve_vt=any&vd=all&vst=.gov&vs=.gov&vf=all&vm=p& fl=0&n=20
For any given search URL, some of the variables you’ll find in the URL are redundant or not necessary. The web form basically acts as a URL-building tool that has assembled this URL for you, and it isn’t picky about which variables it includes. By understanding the pieces of the URL, you can construct your own queries using shorter URLs without the form.
Note that the domain is followed by /
search?, followed by a series of variable/ value pairs separated by ampersands. Not all of these variables will affect the search results, but there are some that are useful to play with. The variables are a bit cryptic (to keep the URLs as short as possible), so here’s a list of the relevant variables and what they represent.
v* variables represent the way you’d like Yahoo! to handle the phrase. You can choose from the following variables:
Another group of similarly patterned variables lets you limit searching to a specific part of a document, such as the title or URL. The format for these variables is
v*_vt, where the asterisk is replaced by the type of primary search query. The possible values include
url. For example, if you’d like to search for pages that have the exact phrase astronomy magazine in the title, use the
vp_vt variables together, like so:
If you’d like to limit your results to pages that have been updated recently, you can use the
vd variable. You can get all results, which is the default, or limit them to pages updated within the last three months, six months, or year. The respective values for these are
all, m3, m6,or
y. So finding all documents that contain the phrase astronomy magazine that have been updated within the last three months looks like this:
vs variable is useful for limiting searches to a top-level domain, such as .com. In addition to top-level searches, you can narrow things to a specific web site. If you want to find every mention of astronomy magazine at the specific web site http://www.cnn.com, you could use the variable like this:
The default value; returns any type of document
Adobe PDF files
Microsoft Excel spreadsheets (note that this value is an abbreviation for the full file extension, .xls)
Microsoft PowerPoint presentations
Microsoft Word files
Files formatted for syndication across web sites
Plain text files, which typically end with .txt
The number of results is controlled by the n variable, which can be set only to some predetermined values:
10, 15, 20, 30, 40,or
100. To return the first 40 results for the phrase astronomy magazine, add the n variable, like so:
There are other variables in advanced search URLs, but these are a few that will affect the content of search results. Now that you know why the initial Advanced Web Search URL was so long, you can use some of the variables to create your own advanced Yahoo! searches on the fly.