Summary
For developing applications that can understand voice or text input, we use techniques from the natural language processing domain. We have just seen several widely used ways to preprocess texts: tokenization, stop words removal, stemming, lemmatization, POS tagging, and named entity recognition.
Word embedding algorithms, and mainly Word2Vec, draw inspiration from the distributive semantics hypothesis, which states that the meaning of the word is defined by its context. Using an autoencoder-like neural network, we learn fixed-size vectors for each word in a text corpus. Effectively, this neural network captures the context of the word and encodes it in the corresponding vector. Then, using linear algebra operations with those vectors, ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access