Text analytics is at the intersection of machine learning, mathematics, linguistics, and natural language processing. Text analytics, referred to as text mining in older literature, attempts to extract information and infer higher level concepts, sentiment, and semantic details from unstructured and semi-structured data. It is important to note that the traditional keyword searches are insufficient to deal with noisy, ambiguous, and irrelevant tokens and concepts that need to be filtered out based on the actual context.

Ultimately, what we are trying to do is for a given set of documents (text, tweets, web, and social media), is determine what the gist of the communication is and what concepts it is trying to convey (topics and ...

Get Apache Spark 2: Data Processing and Real-Time Analytics now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.