Building a text sentiment classifier with fastText

fastText is a library and is an extension of word2vec for word representation. It was created by the Facebook Research Team in 2016. While Word2vec and GloVe approaches treat words as the smallest unit to train on, fastText breaks words into several n-grams, that is, subwords. For example, the trigrams for the word apple are app, ppl, and ple. The word embedding for the word apple is sum of all the word n-grams. Due to the nature of the algorithm's embedding generation, fastText is more resource-intensive and takes additional time to train. Some of the advantages of fastText are as follows:

  • It generates better word embeddings for rare words (including misspelled words).
  • For out of vocabulary ...

Get Advanced Machine Learning with R now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.