Skip to Content
Natural Language Processing with Spark NLP
book

Natural Language Processing with Spark NLP

by Alex Thomas
June 2020
Beginner to intermediate
364 pages
8h 58m
English
O'Reilly Media, Inc.
Content preview from Natural Language Processing with Spark NLP

Chapter 11. Word Embeddings

Word embeddings are part of distributional semantics, similar to the topic models we discussed in the previous chapter. Unlike topic models, word embeddings do not work on term-document relationships. Instead, word embeddings work with smaller contexts like sentences or subsequences of tokens in a sentence.

The field of word embeddings is a rapidly evolving set of techniques. The most popular technique, Word2vec, was developed in 2013 by Tomas Mikolov et al. at Google. Since then, there has been much research (and hype). The idea is that you use a neural network to build a language model. Once this model is learned, you can take some of the intermediate values in the network as representations of the input term.

In this chapter, we will look at the implementation of Word2vec in code. This will help give us a clear understanding of the fundamentals of this family of techniques. We will discuss the more recent approaches at a higher level because they can be quite resource intensive.

Word2vec

One of the ideas behind deep learning is that the hidden layers are “higher level” representations of the data. This comes from analysis of the visual cortex. As the information travels from the eye through the brain, neurons appear to be associated with more complex shapes. The early layers of neurons recognize only points of light and dark, later neurons recognize lines and curves, and so on. Using this assumption, if we train a language model using a neural network, ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

Natural Language Processing (NLP)

Natural Language Processing (NLP)

Bruno Goncalves

Publisher Resources

ISBN: 9781492047759Errata Page