Google serves as a handy searchable archive for back issues of online publications.
Not all sites have their own search engines, and even the ones that do are sometimes difficult to use. Complicated or incomplete search engines are more pain than gain when attempting to search through archives of published articles. If you follow a couple of rules, Google is handy for finding back issues of published resources.
The trick is to use a common phrase to find the information youâre looking for. Letâs use the New York Times as an example.
Your first intuition when searching for previously published articles
from NYTimes.com
might be to simply use site:nytimes.com
in your
Google query. For example, if I wanted to find articles on George
Bush, why not use:
"george bush" site:nytimes.com
This will indeed find you all articles mentioning George Bush published on NYTimes.com. What it wonât find is all the articles produced by the New York Times but republished elsewhere.
Tip
While doing research, keep credibility firmly in mind. If youâre doing casual research, maybe you donât need to double-check a story to make sure it actually comes from the New York Times, but if youâre researching a term paper, double-check the veracity of every article you find that isnât actually on the New York Times site.
What you actually want is a clear identifier, no matter the site of origin, that an article comes from the New York Times. Copyright disclaimers are perfect for the job. A New York Times copyright notice typically reads:
Copyright 2001 The New York Times Company
Of course, this would only find articles from 2001. A simple workaround is to replace the year with a Google full-word wildcard [Hack #13]:
Copyright * The New York Times Company
Letâs try that George Bush search again, this time
using the snippet of copyright disclaimer instead of the
site:
restriction:
"Copyright * The New York Times Company" "George Bush"
At this writing, you get over three times as many results for this search as for the earlier attempt.
Copyright disclaimers are also useful for finding magazine articles. For example, Scientific Americanâs typical copyright disclaimer looks like this:
Scientific American, Inc. All rights reserved.
(The date appears before the disclaimer, so I just dropped it to avoid having to bother with wildcards.)
Using that disclaimer as a quote-delimited phrase along with a search
wordâhologram
, for exampleâyields the
Google query:
hologram "Scientific American, Inc. All rights reserved."
At this writing, youâll get one result, which seems
like a small number for a general query like
hologram
. When you get fewer results than
youâd expect, fall back on using the
site:
syntax to go back to the originating site
itself.
hologram site:sciam.com
In this example, youâll find several results that you can grab from Googleâs cache but are no longer available on the Scientific American site.
Most publications that Iâve come across have some kind of common text string that you can use when searching Google for its archives. Usually itâs a copyright disclaimer and most often itâs at the bottom of a page. Use Google to search for that string and whatever query words youâre interested in, and if that doesnât work, fall back on searching for the query string and domain name.
Get Google Hacks now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.