Skip to Main Content
Python Data Science Essentials
book

Python Data Science Essentials

by Alberto Boschetti
April 2015
Beginner content levelBeginner
258 pages
5h 48m
English
Packt Publishing
Content preview from Python Data Science Essentials

A peek into Natural Language Processing (NLP)

This section is not strictly related to machine learning, but it contains some machine learning results in the area of Natural Language Processing. Python has many toolkits to process text data, but the most powerful and complete toolkit is NLTK, the Natural Language Tool Kit.

In the following sections, we'll explore its core functionalities. We will work on the English language; for other languages, you will first need to download the language corpora (note that sometimes, languages have no free open source corpora for NLTK).

Word tokenization

Tokenization is the action of splitting the text in words. Chunking the whitespace seems very easy, but it's not, because text contains punctuation and contractions. ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Python Data Science Essentials - Second Edition

Python Data Science Essentials - Second Edition

Luca Massaron, Alberto Boschetti
Python Data Science Essentials - Third Edition

Python Data Science Essentials - Third Edition

Alberto Boschetti, Luca Massaron, Pietro Marinelli, Matteo Malosetti
Python: End-to-end Data Analysis

Python: End-to-end Data Analysis

Phuong Vothihong, Martin Czygan, Ivan Idris, Magnus Vilhelm Persson, Luiz Felipe Martins

Publisher Resources

ISBN: 9781785280429Supplemental Content