fastText
fastText (https://fasttext.cc/) is a library for learning word embeddings and text classification created by the Facebook AI Research (FAIR) group. Word2Vec treats each word in the corpus as an atomic entity and generates a vector for each word, but this approach ignores the internal structure of the words. In contrast, fastText decomposes each word, w, to a bag of character n-grams. For example, if n = 3, we can decompose the word there to the character 3-grams and the special sequence <there> for the whole word:
<th, the, her, ere, re>
Note the use of the special characters < and > to indicate the start and the end of the word. This is necessary to avoid mismatching between n-grams from different words. For example, the word
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access