O'Reilly logo

Hands-On Natural Language Processing with Python by Rajalingappaa Shanmugamani, Rajesh Arumugam

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Data for text classification

Before diving into the machine learning (ML) problems in text classification, we will take a look at the different open datasets that are available on the internet. Many of the classification tasks may require large labeled text data. This data can be broadly grouped into those with binary classes, multi-classes, and multi-labels. The following are some of the popular datasets used for benchmarking in both research and some competitions, such as Kaggle:

Dataset name
Class type
Source

1

IMDb movie Dataset

Binary classes

http://ai.stanford.edu/~amaas/data/sentiment/

2

Twitter Sentiment Analysis Dataset

Binary classes

http://thinknook.com/twitter-sentiment-analysis-training-corpus-dataset-2012-09-22/ ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required