Natural Language Processing (NLP)

by Bruno Goncalves

Released October 2018

Publisher(s): Pearson

ISBN: 0135258847

Start your free trial

Video description

2+ Hours of Video Instruction

Overview

Natural Language Processing LiveLessons covers the fundamentals of natural language processing (NLP). It introduces you to the basic concepts, ideas, and algorithms necessary to develop your own NLP applications in a step-by-step and intuitive fashion. The lessons follow a gradual progression, from the more specific to the more abstract, taking you from the very basics to some of the most recent and sophisticated algorithms.

About the Instructor

Bruno Goncalves is currently a Senior Data Scientist working at the intersection of Data Science and Finance. Previously, he was a Data Science fellow at NYU’s Center for Data Science while on leave from a tenured faculty position at Aix-Marseille Universite. Since completing his PhD in the Physics of Complex Systems in 2008 he has been pursuing the use of Data Science and Machine Learning to study Human Behavior. Using large datasets from Twitter, Wikipedia, web access logs, and Yahoo! Meme he studied how we can observe both large scale and individual human behavior in an obtrusive and widespread manner. The main applications have been to the study of Computational Linguistics, Information Diffusion, Behavioral Change and Epidemic Spreading. In 2015 he was awarded the Complex Systems Society's 2015 Junior Scientific Award for “outstanding contributions in Complex Systems Science” and in 2018 is was named a Science Fellow of the Institute for Scientific Interchange in Turin, Italy.

Skill Level

Intermediate

Learn How To

Represent text
Model topics
Conduct sentiment analysis
Understand word2vec word embeddings
Define GloVe
Apply language detection

Who Should Take This Course

Data scientists with an interest in natural language processing

Course Requirements

Basic algebra
Calculus and statistics
Programming experience

Lesson Descriptions

Lesson 1: Text Representations
The first step in any NLP application is to establish the representations of text and numbers. One-hot encodings provide us with a sparse approach to representing words and n-grams, while bag-of-words improves memory efficiency even further. Naturally, not all words are meaningful, so the next steps are to remove meaningless stop words and to identify the most relevant words for our application using term frequency/inverse document frequency (TF/IDF). Finally, the lesson covers how to identify the stems of words so you can meaningfully reduce the size of your vocabulary.

Lesson 2: Topic Modeling
Lesson 2 builds on the text representations of Lesson 1 to develop ways of identifying the main subject or subjects of a text. Bruno starts by defining topics and how they can be identified. Next, you learn how to perform explicit semantic analysis to find documents mentioning a specific topic and how to cluster documents according to topics. Latent semantic analysis provides yet another powerful way to extract meaning from raw text, while non-negative matrix factorization enables you to identify latent dimensions in the text, perform recommendations, and measure similarities.

Lesson 3: Sentiment Analysis
After covering how to represent text in a meaningful way and identifying the topics covered in a document, we now focus on how to extract sentiment information. In other words, what kind of sentiments are being expressed? Are the words used positive or negative? The next step is to consider corpus-based approaches to defining the valence of each word and, finally, how to handle negations and modifiers.

Lesson 4: Applications
The first three lessons covered the fundamental tools of NLP, and now you are ready to consider specific applications and advanced topics. Perhaps one of the most important developments in NLP in recent years is the popularization of word embeddings in general and word2vec in particular. This enables you to delve deeper into vector representations of words and concepts, and to understand how semantic relations can be expressed through vector algebra. GloVe is the main competitor to word2vec, and this lesson also explores its advantages and disadvantages. As the final application of NLP and the last section in our course, we consider the question of language detection.

About Pearson Video Training

Pearson publishes expert-led video tutorials covering a wide selection of technology topics designed to teach you the skills you need to succeed. These professional and personal technology videos feature world-leading author instructors published by your trusted technology brands: Addison-Wesley, Cisco Press, Pearson IT Certification, Prentice Hall, Sams, and Que Topics include: IT Certification, Network Security, Cisco Technology, Programming, Web Development, Mobile Development, and more. Learn more about Pearson Video training at http://www.informit.com/video.