CHAPTER 11Understand Text Analytics

“Seek success but prepare for vegetables.”

InspireBot™, an AI bot “dedicated to generating unlimited amounts of unique inspirational quotes.”1

The last several chapters have dealt with data as we commonly understand it. For most of us, datasets are tables with rows and columns. That's structured data. In reality, though, most of the data you interact with every day is unstructured. It's in the text you read. It's in the words and sentences of emails, news articles, social media posts, Amazon product reviews, Wikipedia articles, and this book in your hands.

That unstructured textual data is ripe for analysis, but it has to be treated a bit differently. That's what this chapter is about.

EXPECTATIONS OF TEXT ANALYTICS

Before we dive in, we want to set your expectations. Text analytics has received a lot of attention and focus over the years. One example is sentiment analysis—that is the ability to identify the positive or negative emotions behind a social media post, a comment, or a complaint. But as you'll see, text analytics is not such an easy thing to do. By the end of this chapter, you'll have a sense of why some companies can succeed in its use while others will have their work ahead of them.

Many people have preconceived notions about what's possible with computers and human language, undoubtedly influenced by the tremendous success of IBM's Watson computer on the quiz show Jeopardy! in 20112 and the more recent advancements in speech-recognition ...

Get Becoming a Data Head now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.