By understanding how Yahoo! Advanced Search URLs are structured, you can create your own Advanced Search queries on the fly.
In addition to the simple search form you’ll find at http://search.yahoo.com, Yahoo! offers an Advanced Web Search form at http://search.yahoo.com/web/advanced. This form lets you refine your search in a number of ways, so you can narrow the results to a more useful list.
For example, if you’d like to find information about a generic topic, such as astronomy, you could go to Yahoo!, type astronomy
into the search form, and find hundreds of sites related to the word. But if you want only a segment of those results, you can browse over to the Advanced Web Search form, type astronomy
, and limit the results by top-level domain, as shown in Figure 1-9.
A search for astronomy
across .gov sites returns only pages at NASA’s web site. The same search limited to .edu sites results in astronomy programs at various universities, and limiting to .com gives you astronomy magazines at the top of the results.
You can further refine your search by limiting it to a specific file format, such as PDF files, Excel spreadsheets, or XML files. For any given search, you can also override your global preferences settings for language, number of results, and adult content filtering.
To get started with hacking URLs, type a term into the Advanced Web Search form and click the Yahoo! Search button, which will take you to the results page. Once there, note the insanely long URL in the address of your browser. It will look something like this:
http://search.yahoo.com/search?_adv_prop=web&x=op&ei=UTF-8&va=astronomy&va_ vt=any&vp_vt=any&vo_vt=any&ve_vt=any&vd=all&vst=.gov&vs=.gov&vf=all&vm=p& fl=0&n=20
For any given search URL, some of the variables you’ll find in the URL are redundant or not necessary. The web form basically acts as a URL-building tool that has assembled this URL for you, and it isn’t picky about which variables it includes. By understanding the pieces of the URL, you can construct your own queries using shorter URLs without the form.
Note that the domain is followed by /search?
, followed by a series of variable/ value pairs separated by ampersands. Not all of these variables will affect the search results, but there are some that are useful to play with. The variables are a bit cryptic (to keep the URLs as short as possible), so here’s a list of the relevant variables and what they represent.
The v*
variables represent the way you’d like Yahoo! to handle the phrase. You can choose from the following variables:
Table 1-2.
Another group of similarly patterned variables lets you limit searching to a specific part of a document, such as the title or URL. The format for these variables is v*_vt
, where the asterisk is replaced by the type of primary search query. The possible values include any, title
,or url
. For example, if you’d like to search for pages that have the exact phrase astronomy magazine in the title, use the vp
and vp_vt
variables together, like so:
search?vp=astronomy+magazine&vp_vt=title
If you’d like to limit your results to pages that have been updated recently, you can use the
vd
variable. You can get all results, which is the default, or limit them to pages updated within the last three months, six months, or year. The respective values for these are all, m3, m6
,or y
. So finding all documents that contain the phrase astronomy magazine that have been updated within the last three months looks like this:
search?vp=astronomy+magazine&vp_vt=any&vd=m3
The
vs
variable is useful for limiting searches to a top-level domain, such as .com. In addition to top-level searches, you can narrow things to a specific web site. If you want to find every mention of astronomy magazine at the specific web site http://www.cnn.com, you could use the variable like this:
search?vp=astronomy+magazine&vp_vt=any&vs=cnn.com
The
vf
variable limits searches to a specific file type. Yahoo! supports a set number of file types, and here are the current values you can use with this variable:
-
all
The default value; returns any type of document
-
html
HTML documents
-
pdf
Adobe PDF files
-
xl
Microsoft Excel spreadsheets (note that this value is an abbreviation for the full file extension, .xls)
-
ppt
Microsoft PowerPoint presentations
-
msword
Microsoft Word files
-
rss
Files formatted for syndication across web sites
-
text
Plain text files, which typically end with .txt
To continue with the example, say you want to find the phrase astronomy magazine in only PowerPoint presentations. Append the
vf
variable, like so:
search?vp=astronomy+magazine&vp_vt=any&vf=ppt
The number of results is controlled by the n variable, which can be set only to some predetermined values: 10, 15, 20, 30, 40
,or 100
. To return the first 40 results for the phrase astronomy magazine, add the n variable, like so:
search?vp=astronomy+magazine&vp_vt=any&n=40
There are other variables in advanced search URLs, but these are a few that will affect the content of search results. Now that you know why the initial Advanced Web Search URL was so long, you can use some of the variables to create your own advanced Yahoo! searches on the fly.
Get Yahoo! Hacks now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.