Chapter 2. Natural Language Processing

Humans have been creating the written word for thousands of years, and we’ve become pretty good at reading and interpreting the content quickly. Intention, tone, slang, and abbreviations—most native speakers of a language can process this context in both written and spoken word quite well. But machines are another story. As early as the 1950s computer scientists began attempts at using software to process and analyze textual components, sentiment, parts of speech, and the various entities that make up a body of text. Until relatively recently, processing and analyzing language has been quite a challenge.

Ever since IBM’s Watson won on the game show Jeopardy!, the promise of machines being able to understand language has slowly edged closer. In today’s world, where people live out their lives through social media, the opportunity to gain insights from the millions of words of text being produced every day has led to an arms race. New tools allow developers to easily create models that understand words used in the context of their industry. This leads to better business decisions and has resulted in a high-stakes competition in many industries to be the first to deliver.

Strikingly, 90% of the world’s data was created in the past two years, and 80% of that data is unstructured. Insights valuable to the enterprise are hidden in this data—which ranges from emails to customer support discussions to research reports. This information is incredibly ...

Get Getting Started with Artificial Intelligence now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.