Skip to Main Content
Amazon Hacks
book

Amazon Hacks

by Paul Bausch
August 2003
Intermediate to advanced content levelIntermediate to advanced
304 pages
7h 33m
English
O'Reilly Media, Inc.
Content preview from Amazon Hacks

Scrape Product Reviews

Amazon has made some reviews available through their Web Services API, but most are available only at the Amazon.com web site, requiring a little screen scraping to grab.

Here’s an even more powerful way to integrate Amazon reviews with your web site. Unlike linking to reviews [Hack #28] or monitoring reviews for changes [Hack #31], this puts the entire text of Amazon reviews on your web site.

The easiest and most reliable way to access customer reviews programmatically is through the Web Services API. Unfortunately, the API gives only a small window to the larger number of reviews available. An API query for the book Cluetrain Manifesto, for example, includes three user reviews. If you visit the review page [Hack #28] for that book, though, you’ll find 128 reviews. To dig deeper into the reviews available on Amazon.com and use all of them on your own web site, you’ll need to delve deeper into scripting.

The Code

This Perl script, get_reviews.pl , builds a URL to the reviews page for a given ASIN, uses regular expressions to find the reviews, and breaks the review into its pieces: rating, title, date, reviewer, and the text of the review.

#!/usr/bin/perl # get_reviews.pl # # A script to scrape Amazon, retrieve reviews, and write to a file # Usage: perl get_reviews.pl <asin> use strict; use warnings; use LWP::Simple; # Take the asin from the command-line my $asin = shift @ARGV or die "Usage: perl get_reviews.pl <asin>\n"; # Assemble the URL from the passed asin. ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

From ChatGPT to HackGPT: Meeting the Cybersecurity Threat of Generative AI

From ChatGPT to HackGPT: Meeting the Cybersecurity Threat of Generative AI

Karen Renaud, Merrill Warkentin, George Westerman
Incident Metrics in SRE

Incident Metrics in SRE

Stepan Davidovic

Publisher Resources

ISBN: 0596005423Supplemental ContentCatalog PageErrata