Lesson 23Pulling It All Together: Word Analysis in Python

In this lesson, we will apply what we've learned up to this point to perform a common text analysis process using Python. Specifically, we will build a project that takes a dataset and analyzes it to calculate the number of times each word appears.

EXAMINE THE DATA

When we start any data analysis process, the first step is to ensure that the data is in a format that our system can use and that the data is available for use. For our project, we will need to download the data. We will use the Digital Music review set from Julian McCauley's Amazon product data website, which you can find at http://jmcauley.ucsd.edu/data/amazon/. This file, reviews.json, can also be found in the data folder of the downloadable zip file for this book (JobReadyPython.zip) available at www.wiley.com/go/jobreadypython.

The reviews.json file is in a modified JSON format. If you open the extracted file using any text editor, the first two records look like the following:

{"reviewerID": "A3EBHHCZO6V2A4", "asin": "5555991584", "reviewerName": "Amaranth \"music fan\"", "helpful": [3, 3], "reviewText": "It's hard to believe \"Memory of Trees\" came out 11 years ago;it has held up well over ...

Get Job Ready Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.