Scraping Yahoo! Buzz for a Google Search

A proof of concept hack that scrapes the buzziest items from Yahoo! Buzz and submits them to a Google search.

No web site is an island. Billions of hyperlinks link to billions of documents. Sometimes, however, you want to take information from one site and apply it to another site.

Unless that site has a web service API like Google’s, your best bet is scraping. Scraping is where you use an automated program to remove specific bits of information from a web page. Examples of the sorts of elements people scrape include: stock quotes, news headlines, prices, and so forth. You name it and someone’s probably scraped it.

There’s some controversy about scraping. Some sites don’t mind it, while others can’t stand it. If you decide to scrape a site, do it gently; take the minimum amount of information you need and, whatever you do, don’t hog the scrapee’s bandwidth.

So, what are we scraping?

Google has a query popularity page; it’s called Google Zeitgeist (http://www.google.com/press/zeitgeist.html). Unfortunately, the Zeitgeist is only updated once a week and contains only a limited amount of scrapable data. That’s where Yahoo! Buzz (http://buzz.yahoo.com/) comes in. The site is rich with constantly updated information. Its “Buzz Index” keeps tabs on what’s hot in popular culture: celebs, games, movies, television shows, music and more.

This hack grabs the buzziest of the buzz, top of the “Leaderboard,” and searches Google ...

Get Google Hacks now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.