Amazon of course provides access to all of their community features through their web site. As more and more sites integrate closely with Amazon, though, there is more demand to tap into the community via code.
The Web Services API (see Chapter 6) offers some access. When accessing an individual product’s information through the API, you can find the following community data:
The three latest reviews
ASINs of five related items
Three lists that contain the item
This is fantastic information to have access to. Developers are building tools that work with this data in many creative ways. But when compared with the volume of information that’s available on Amazon’s site, the community information in the API is only a small window into the larger community. That leaves one route for integration-minded developers: screen scraping.
The term screen scraping refers to requesting a web page programmatically with a script, and picking through the resulting HTML for the interesting data. Finding the data itself involves writing complex regular expressions . Regular expressions are a pattern-matching syntax that can become complicated quickly. For example, here’s a regular expression that extracts a list of books from a purchase circle page [Hack #44]:
You can see some HTML there, and the expressions are based on where the data is within the HTML ...