O'Reilly logo

IPython Interactive Computing and Visualization Cookbook by Cyrille Rossant

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Learning from text – Naive Bayes for Natural Language Processing

In this recipe, we show how to handle text data with scikit-learn. Working with text requires careful preprocessing and feature extraction. It is also quite common to deal with highly sparse matrices.

We will learn to recognize whether a comment posted during a public discussion is considered insulting to one of the participants. We will use a labeled dataset from Impermium, released during a Kaggle competition.

Getting ready

Download the Troll dataset from the book's GitHub repository at https://github.com/ipython-books/cookbook-data.

This dataset was obtained from Kaggle, at www.kaggle.com/c/detecting-insults-in-social-commentary.

How to do it...

  1. Let's import our libraries:
    In [1]: import ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required