Chapter 2. Natural Language Processing

Our intelligence is what makes us human, and AI is an extension of that quality.

Yann LeCun, professor, New York University

Humans have been creating the written word for thousands of years, and we’ve become pretty good at reading and interpreting the content quickly. Intention, tone, slang, and abbreviations—most native speakers of a language can process this context in both written and spoken word quite well. But machines are another story. As early as the 1950s, computer scientists began attempts at using software to process and analyze textual components, sentiment, parts of speech, and the various entities that make up a body of text. Until relatively recently, processing and analyzing language has been quite a challenge.

Ever since IBM’s Watson won on the game show Jeopardy!, the promise of machines being able to understand language has slowly edged closer to reality. In today’s world, where people live out their lives through social media, the opportunity to gain insights from the millions of words of text being produced every day has led to an arms race. New tools allow developers to easily create models that understand words used in the context of their industry. This leads to better business decisions and has resulted in a high-stakes competition in many industries to be the first to deliver.

A 2017 study from IBM reported that 90% of the world’s data had been created in the past two years, and that 80% of that data was unstructured ...

Get Getting Started with Artificial Intelligence, 2nd Edition now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.