Appendix F. Locality sensitive hashing

In chapter 4, you learned how to create topic vectors with hundreds of dimensions of real-valued (floating point) numbers. In chapter 6, you learned how to create word vectors that have hundreds of dimensions. Even though you can do useful math operations on these vectors, you cannot quickly search them like you can discrete vectors or strings. Databases don’t have efficient indexing schemes for vectors of more than four dimensions.[1] To use word vectors and document topic vectors efficiently, you need a search index that can help find the nearest neighbors for any given vector.

You need this to convert the results of vector math into a word or set of words (because the resultant vector is never an exact ...

Get Natural Language Processing in Action now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.