May 2020
Intermediate to advanced
404 pages
10h 52m
English
We will read the Amazon Fine Food Reviews dataset with the ISO-8859-1 encoding. This is only to ensure that we do not lose out on any special symbols used in the text of the review:
df = pd.read_csv('Reviews.csv', encoding = "ISO-8859-1")df = df.head(10000)
Since the dataset is very large, we've restricted our work in this chapter to the first 10,000 rows in the dataset.
We would need to remove stop words from the text and filter out symbols such as brackets and other symbols not natural to written text. We will create a function named cleanText(), which will perform the filtering and removal of stop words:
import stringimport restopwordSet = set(stopwords.words("english"))def cleanText(line): ...Read now
Unlock full access