7

Text Analysis Is All You Need

In this chapter, we will learn how to analyze text data and create machine learning models to help us. We will use the Jigsaw Unintended Bias in Toxicity Classification dataset (see Reference 1). This competition had the objective of building models that detect toxicity and reduce unwanted bias toward minorities that might be wrongly associated with toxic comments. With this competition, we introduce the field of Natural Language Processing (NLP).

The data used in the competition originates from the Civil Comments platform, which was founded by Aja Bogdanoff and Christa Mrgan in 2015 (see Reference 2) with the aim of solving the problem of civility in online discussions. When the platform was closed in 2017, they ...

Get Developing Kaggle Notebooks now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.