Skip to Main Content
Text Mining & Natural Language Understanding at Scale
on-demand course

Text Mining & Natural Language Understanding at Scale

with David Talby, Claudiu Branzan
July 2016
Beginner to intermediate content levelBeginner to intermediate
2h 21m
English
O'Reilly Media, Inc.
Closed Captioning available in German, English, Spanish, French, Japanese, Korean, Portuguese (Portugal, Brazil), Chinese (Simplified), Chinese (Traditional)

Overview

A text mining system must go way beyond indexing and search to appear truly intelligent. First, it should understand language beyond keyword matching. For example, it should be able to distinguish the critical difference between “Jane has the flu” and “Jane had the flu when she was 9.” Second, it should be capable of making likely inferences even if they’re not explicitly written. For example, inferring that Jane may have the flu if she has had a fever, headache, fatigue, and runny nose for three days. And third, it should do its work as part of a robust, scalable, efficient, and easy to extend system. This course teaches software engineers and data scientists how to build intelligent natural language understanding (NLU) based text mining systems at scale using Java, Scala, and Spark for distributed processing.

  • Learn the meaning of natural language understanding (NLU) and its use in text mining
  • Discover how to build a natural language processing (NLP) pipeline within a big data framework
  • Recognize the differences between NLP pipelines and other approaches to semantic text mining
  • Learn about standard UIMA annotators, custom annotators, and machine learned annotators
  • Discover how different types of annotators are composed into a text processing pipeline
  • Use machine learning to generate annotators and apply them within a data pipeline
  • See pipeline architectures that incorporate Kafka, Spark, SparkSQL, Cassandra, and ElasticSearch

David Talby (PhD , Computer Science, Hebrew University) and Claudio Branzan (Masters, Industrial Intelligent Systems, Polytechnic University of Timișoara) work for big data analytics firm Atigeo. David is CTO and Claudio runs the Modeling and Predictive Analytics team. David and Claudio co-presented on text mining and natural language understanding at O'Reilly's Strata+Hadoop World London 2016 conference.

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Unsupervised Learning for Exploration and Classification of Health Data

Unsupervised Learning for Exploration and Classification of Health Data

Aileen Nielsen
Building Cognitive Applications with IBM Watson Services: Volume 7 Natural Language Understanding

Building Cognitive Applications with IBM Watson Services: Volume 7 Natural Language Understanding

Sebastian Vergara, Mohamed El-Khouly, Mariam El Tantawi, Shireesh Marla, Lak Sri

Publisher Resources

ISBN: 9781491964309Supplemental Content