December 2018
Beginner to intermediate
684 pages
21h 9m
English
The gensim.models.Word2vec class implements the SG and CBOW architectures introduced previously. The Word2vec notebook contains additional implementation detail.
To facilitate memory-efficient text ingestion, the LineSentence class creates a generator from individual sentences contained in the provided text file:
sentence_path = Path('data', 'ngrams', f'ngrams_2.txt')sentences = LineSentence(sentence_path)
The Word2vec class offers the configuration options previously introduced:
model = Word2vec(sentences, sg=1, # 1=skip-gram; otherwise CBOW hs=0, # hier. softmax if 1, neg. sampling if 0 size=300, # Vector dimensionality window=3, # Max dist. btw target and context word min_count=50, # Ignore words with lower frequency negative ...