Skip to Main Content
fastText Quick Start Guide
book

fastText Quick Start Guide

by Joydeep Bhattacharjee
July 2018
Intermediate to advanced content levelIntermediate to advanced
194 pages
5h 22m
English
Packt Publishing
Content preview from fastText Quick Start Guide

The n-gram features and the hashing trick

As you have seen, the BoW of the vocabulary is taken to arrive at the word representation to be used later in the classification process. But the BoW is unordered and does not have any syntactic information. Hence, the bag of n-grams are used as additional features to capture some of the syntactic information.

As we have already discussed, large-scale NLP problems almost always involve using a large corpus. This corpus will always have unbounded number of unique words, as we have seen from the Zipf's law. Words are generally defined as a string of characters separated with a delimiter, such as a space in English. Hence, taking word n-grams is simply not scalable to large corpora, which is essential ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

WebAssembly: The Definitive Guide

WebAssembly: The Definitive Guide

Brian Sletten
Spark: The Definitive Guide

Spark: The Definitive Guide

Bill Chambers, Matei Zaharia
Learning Go

Learning Go

Jon Bodner

Publisher Resources

ISBN: 9781789130997Supplemental Content