Building queries to search only recent commentary appearing in weblogs.
Time was when you needed to find current commentary, you didnât turn to a full-text search engine like Google. You searched Usenet, combed mailing lists, or searched through current news sites like CNN.com and hoped for the best.
But as search engines have evolved, theyâve been able to index pages more quickly than once every few weeks. In fact, Google tunes its engine to more readily index sites with a high information churn rate. At the same time, a phenomenon called the weblog (http://www.oreilly.com/catalog/essblogging/) has arisen, an online site keeps a running commentary and associated links, updated dailyâand indeed, even more often in many cases. Google indexes many of these sites on an accelerated schedule. If you know how to find them, you can build a query that searches just these sites for recent commentary.
When weblogs first appeared on the Internet, they were generally updated manually or by using homemade programs. Thus, there were no standard words you could add to a search engine to find them. Now, however, many weblogs are created using either specialized software packages (like Movable Type, http://www.movabletype.org/, or Radio Userland, http://radio.userland.com/) or as web services (like Blogger, http://www.blogger.com/). These programs and services are more easily found online with some clever use of special syntaxes [Section 1.5] or magic words.
For hosted weblogs, the site:
syntax makes things
easy. Blogger weblogs hosted at
blog*spot (http://www.blogspot.com/) can be found using
site:blogspot.com
. Even though Radio Userland is a
software program able to post its weblogs to any web server, you can
find the majority of Radio Userland weblogs at the Radio Userland
community server (http://radio.weblogs.com/) using
site:radio.weblogs.com
.
Finding weblogs powered by weblog software and hosted elsewhere is
more problematic; Movable Type weblogs, for example, can be found all
over the Internet. However, most of them sport a
âpowered by movable typeâ link of
some sort; searching for the phrase "powered by movable type"
will, therefore, find many of them.
It comes down to magic words typically found on weblog pages, shout-outs, if you will, to the software or hosting sites. The following is a list of some of these packages and services and the magic words used to find them in Google:
- Blogger
"powered by blogger"
orsite:blogspot.com
- Blosxom
"powered by blosxom"
- Greymatter
"powered by greymatter"
- Geeklog
"powered by geeklog"
- Manila
"a manila site"
orsite:editthispage.com
- Pitas (a service)
site:pitas.com
- pMachine
"powered by pmachine"
- uJournal (a service)
site:ujournal.org
- LiveJournal (a service)
site:livejournal.com
- Radio Userland
intitle:"radio weblog"
orsite:radio.weblogs.com
Because you canât have more than 10 words in a Google query, thereâs no way to build a query that includes every conceivable weblogâs magic words. Itâs best to experiment with the various words, and see which weblogs have the materials youâre interested in.
First of all, realize that weblogs are usually informal commentary
and youâll have to keep an eye out for misspelled
words, names, etc. Generally, itâs better to search
by event than by name, if possible. For example, if you were looking
for commentary on a potential strike, the phrase
"baseball
strike"
would be a
better search, initially, than a search for the name of the
Commissioner of Major League Baseball, Bud Selig.
You can also try to search for a word or phrase relevant to the
event. For example, for a baseball strike you could try searching for
"baseball strike" "red sox"
(or "baseball strike" bosox
)âif youâre searching
for information on a wildfire and wondering if anyone had been
arrested for arson, try wildfire arrested
and if
that doesnât work, wildfire arrested arson
. (Why not search for arson
to
begin with? Because itâs not certain that a weblog
commentator would use the word
âarson.â Instead, he might just
refer to someone being arrested for setting the fire.
âArrestedâ in this case is a more
certain word than
âarson.â)
Get Google Hacks now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.