Skip to Content
Jupyter Cookbook
book

Jupyter Cookbook

by Toomey, Nikhil Borkar, Nikhil Akki, Juan Tomás Oliva Ramos
April 2018
Beginner content levelBeginner
238 pages
7h 13m
English
Packt Publishing
Content preview from Jupyter Cookbook

How it works...

There are several basic text processing techniques in use here. First, we build a corpus of the text. A corpus is a collection of text streams, typically paragraphs or pages of a book.

We then clean up the corpus in several steps:

  • Convert all of the text to lowercase: This facilitates indexing of strings in the text without any concerns about capitalization.
  • Remove punctuation: Punctuation is not of interest.
  • Remove numbers: Again, we are looking for themes in the page.
  • Remove stop words: Remove all the miscellaneous words, such as the, and, and then. I'm not sure if there is a stop words set to exclude all the HTML tags present on web pages.

We cannot produce a document matrix from the corpus. This produces a word index ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Python Cookbook, 3rd Edition

Python Cookbook, 3rd Edition

David Beazley, Brian K. Jones
Pandas 1.x Cookbook - Second Edition

Pandas 1.x Cookbook - Second Edition

Matthew Harrison, Theodore Petrou
bash Cookbook, 2nd Edition

bash Cookbook, 2nd Edition

Carl Albing, JP Vossen

Publisher Resources

ISBN: 9781788839440Supplemental Content