Chapter 12. Text Mining

 

"I think it's much more interesting to live not knowing than to have answers which might be wrong."

 
 -- Richard Feynman

The world is awash in textual data. If you Google, Bing, or Yahoo how much of the data is unstructured, that is, in a textual format, estimates would range from 80 to 90 percent. The real number doesn't matter. What does matter is that a large proportion of the data is in a text format. The implication is that anyone seeking to find insights in the data must develop the capability to process and analyze text.

When I first started out as a market researcher, I used to manually pore through page after page of moderator-led focus groups and interviews with the hope of capturing some qualitative insight—an ...

Get R: Unleash Machine Learning Techniques now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.