May 2020
Intermediate to advanced
404 pages
10h 52m
English
The dataset contains more data than is useful to us for the demo at hand. We will extract the ProductId, UserId, Score, and Text columns to prepare our demo. The names of the products are encrypted for privacy reasons, just as the names of the users are encrypted:
data = df[['ProductId', 'UserId', 'Score', 'Text']]
Keeping data encrypted and free of personal information is a challenge in data science. It is important to remove parts from the dataset that would make it possible to identify the private entities that are a part of the dataset. For example, you would need to remove people and organization names from the text of the review to stop the products and users from being identified, despite them having encrypted ...
Read now
Unlock full access