O'Reilly logo

Hands-On Natural Language Processing with Python by Rajalingappaa Shanmugamani, Rajesh Arumugam

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Doc2vec

A simple extension of the Word2vec model, applied to the document level, was proposed by Mikilov et al. In this method, in order to obtain document vectors, a unique document ID is appended to the document. It is trained with the words in the document to produce an average (or concatenated) of the word embeddings, in order to produce a document embedding. Hence, in the example that we discussed earlier, the doc2vec model data would look as follows:

  • TensorFlow is an open source software library
  • Python is an open source interpreted software programming language

Contrary to the earlier approach, the document lists now look as follows:

  • [DOC_01, TensorFlow, is, an, open, source, software, library]
  • [DOC_02, Python, is, an, open, source, ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required