November 2019
Intermediate to advanced
346 pages
9h 36m
English
In this section, we demonstrate a technique for extracting the most frequent N-grams quickly and memory-efficiently. This allows us to make the challenges that come with the immense number of N-grams easier. The technique is called Hash-Grams, and relies on hashing the N-grams as they are extracted. A property of N-grams is that they follow a power law that ensures that hash collisions have an insignificant impact on the quality of the features thus obtained.