Skip to Content
Data Science from Scratch
book

Data Science from Scratch

by Joel Grus
April 2015
Beginner
328 pages
7h 18m
English
O'Reilly Media, Inc.
Content preview from Data Science from Scratch

Chapter 20. Natural Language Processing

They have been at a great feast of languages, and stolen the scraps.

William Shakespeare

Natural language processing (NLP) refers to computational techniques involving language. It’s a broad field, but we’ll look at a few techniques both simple and not simple.

Word Clouds

In Chapter 1, we computed word counts of users’ interests. One approach to visualizing words and counts is word clouds, which artistically lay out the words with sizes proportional to their counts.

Generally, though, data scientists don’t think much of word clouds, in large part because the placement of the words doesn’t mean anything other than “here’s some space where I was able to fit a word.”

If you ever are forced to create a word cloud, think about whether you can make the axes convey something. For example, imagine that, for each of some collection of data science–related buzzwords, you have two numbers between 0 and 100—the first representing how frequently it appears in job postings, the second how frequently it appears on resumes:

data = [ ("big data", 100, 15), ("Hadoop", 95, 25), ("Python", 75, 50),
         ("R", 50, 40), ("machine learning", 80, 20), ("statistics", 20, 60),
         ("data science", 60, 70), ("analytics", 90, 3),
         ("team player", 85, 85), ("dynamic", 2, 90), ("synergies", 70, 0),
         ("actionable insights", 40, 30), ("think out of the box", 45, 10),
         ("self-starter", 30, 50), ("customer focus", 65, 15),
         ("thought leadership", 35, 35)]

The word cloud approach is ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Data Science from Scratch, 2nd Edition

Data Science from Scratch, 2nd Edition

Joel Grus
Doing Data Science

Doing Data Science

Cathy O'Neil, Rachel Schutt
Learning Data Science

Learning Data Science

Sam Lau, Joseph Gonzalez, Deborah Nolan

Publisher Resources

ISBN: 9781491901410Errata Page