O'Reilly logo

Amazon Hacks by Paul Bausch

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Scrape Product Reviews

Amazon has made some reviews available through their Web Services API, but most are available only at the Amazon.com web site, requiring a little screen scraping to grab.

Here’s an even more powerful way to integrate Amazon reviews with your web site. Unlike linking to reviews [Hack #28] or monitoring reviews for changes [Hack #31], this puts the entire text of Amazon reviews on your web site.

The easiest and most reliable way to access customer reviews programmatically is through the Web Services API. Unfortunately, the API gives only a small window to the larger number of reviews available. An API query for the book Cluetrain Manifesto, for example, includes three user reviews. If you visit the review page [Hack #28] for that book, though, you’ll find 128 reviews. To dig deeper into the reviews available on Amazon.com and use all of them on your own web site, you’ll need to delve deeper into scripting.

The Code

This Perl script, get_reviews.pl , builds a URL to the reviews page for a given ASIN, uses regular expressions to find the reviews, and breaks the review into its pieces: rating, title, date, reviewer, and the text of the review.

#!/usr/bin/perl # get_reviews.pl # # A script to scrape Amazon, retrieve reviews, and write to a file # Usage: perl get_reviews.pl <asin> use strict; use warnings; use LWP::Simple; # Take the asin from the command-line my $asin = shift @ARGV or die "Usage: perl get_reviews.pl <asin>\n"; # Assemble the URL from the passed asin. ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required