on-demand course

Get Started with Natural Language Processing Using Python, Spark, and Scala

with O'Reilly Media, Inc.

March 2017

Beginner to intermediate

5h 47m

English

O'Reilly Media, Inc.

Closed Captioning available in German, English, Spanish, French, Japanese, Korean, Portuguese (Portugal, Brazil), Chinese (Simplified), Chinese (Traditional)

Watch now

Unlock full access

Includes

Badge

Course outline

Welcome to the Course
1m 39s
Natural Language Understanding in Examples
10m 9s
Building an NLP Pipeline
15m 49s
Commonly Used Annotators
8m 47s
Detecting Positive, Negative & Speculative Polarity
12m 9s
Machine Learned Annotators
12m 16s
NLP Pipelines are Domain Specific
6m 55s
Unified Medical Language System (UMLS)
3m 33s
Coding Custom Annotators
7m 17s
Training & Using Machine Learned Annotators
9m 45s
The Need for Learned and Updated Ontologies
9m 39s
Learning New Medical Concepts and Relationships
19m 37s
An End-to-End Reference Architecture
4m 19s
Spark, SparkSQL, Cassandra Workflow
3m 16s
ElasticSearch & SparkSQL
6m 52s
Language is Source and Domain-Specific
9m 32s
Welcome to the Course
1m 37s
Notebook 1: Introduction
2m 35s
Annotation Library
4m 15s
Basic Annotators
8m 59s
Vocabulary Analysis
9m 30s
Exercise: Building a stopword annotator
5m 6s
Notebook 2: Introduction
2m 14s
Model-based Annotators
4m 18s
Creating a Binary Classifier
14m 38s
Exercise: Predicting score or popularity
5m 30s
Notebook 3: Introduction
2m 12s
K-Means clustering
7m 3s
LDA topic modeling
7m 39s
Exercise: Using topics for score or popularity prediction
2m 36s
Notebook 4: Introduction
2m 7s
Word2Vec
5m 5s
Expanding genre entity lists
4m 49s
Exercise: Using Word2Vec based features for score or popularity prediction
2m 44s

Overview

Whether you’re a programmer with little to no knowledge of Python, or an experienced data scientist or engineer, this course will walk you through natural language processing, using both Python and Scala, and show you how to implement a range of popular tools including Spark, scikit-learn, SpaCy, NLTK, and gensim for text mining.

You’ll learn the most common techniques for processing text, how to use machine learning to generate annotators and apply them within a data pipeline, and the differences between NLP pipelines and other approaches to semantic text mining. You’ll learn about standard UIMA annotators, custom annotators, and machine-learned annotators, and understand how architectures for text processing pipelines can incorporate some of the most popular big data tools such as Kafka, Spark, SparkSQL, Cassandra, and ElasticSearch.

By the end of the course, you will be able to build a natural language processing and entity extraction pipeline, and will have a complete understanding of the capabilities and limitations of natural language text processing.

Materials or downloads needed in advance: Example files

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Watch now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Natural Language Processing with Spark NLP

Publisher Resources

ISBN: 9781491985854

Introduction

Getting Started: Basic String Processing In Python

Converting Text To Symbols: Tokenization In NLTK and spaCy

Going Subsymbolic: Vector Representations

Finding The Structure Of Text: Parsing In spaCy

Determining How The Writer Feels: Sentiment Analysis In VADER

Making Decisions: Text Classification

Indentifying Discussed Topics: LDA In Gensim

Toward Machine Reading: Entity Extraction And Linking

Conclusion

Part 1: Introduction

Part 2: NLP Pipelines

Part 3 - Annotators

Part 4: Custom Annotators

Part 5: Machine Learned Annotators

Part 6: Ontology Enrichment

Part 7: Architecture

Part 8: Parting Advice

Part 1: Building a natural language processing and entity extraction pipeline on Scala & Spark

Part 2: Machine Learning Applications for Statistical Natural Language Understanding at Scale

Part 3: Topic Modeling on Natural Language with Scala, Spark and MLLib

Part 4: Deep Learning Applications for Natural Language Understanding with Scala, Spark and MLLib

Overview

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,and much more.

More than 5,000 organizations count on O’Reilly

Julian F.

Addison B.

Amir M.

Mark W.

You might also like

Natural Language Processing with Spark NLP

Hands-On Python Natural Language Processing

Natural Language Text Processing with Python

Interpretable Machine Learning with Python

Publisher Resources

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.