Chapter 16. Acceleration Structures

So what are acceleration structures? In computer science terminology, when you try to rank every item in a corpus one by one, the typical amount of time it would take if there are N items is proportional to N. This is called big O notation. So if you have a user vector and you have a corpus of N items, it would take typically O(N) time to score all the items in the corpus for one user. This is usually tractable if N is small and can fit into GPU RAM, typically N < 1 million items or so. However, if we have a very large corpus of, say, a billion items, it might take a very long time if we also have to make recommendations for a billion users. Then in big O notation it would be O(10¹⁸) dot products to score a billion items for each and every one of a billion users.

In this chapter, we will try to reduce the O(N * M) time to something sublinear in the number of items N and the number of users M. We will discuss strategies including the following:

Sharding
Locality sensitive hashing
k-d Trees
Hierarchical k-means
Cheaper retrieval methods

We’ll also cover the trade-offs related to each strategy and what they could be used for. For all the following examples, we assume that the user and items are represented by embedding vectors of the same size and that the affinity between the user and items is a simple dot product, cosine distance, or Euclidean distance. If we were to use a neural network like a two-tower model to score the user and ...

Get Building Recommendation Systems in Python and JAX now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Building Recommendation Systems in Python and JAX by Bryan Bischof, Hector Yee

Chapter 16. Acceleration Structures

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly