Chapter 16. Acceleration Structures
So what are acceleration structures? In computer science terminology, when you try to rank every item in a corpus one by one, the typical amount of time it would take if there are N items is proportional to N. This is called big O notation. So if you have a user vector and you have a corpus of N items, it would take typically O(N) time to score all the items in the corpus for one user. This is usually tractable if N is small and can fit into GPU RAM, typically N < 1 million items or so. However, if we have a very large corpus of, say, a billion items, it might take a very long time if we also have to make recommendations for a billion users. Then in big O notation it would be O(1018) dot products to score a billion items for each and every one of a billion users.
In this chapter, we will try to reduce the O(N * M) time to something sublinear in the number of items N and the number of users M. We will discuss strategies including the following:
-
Sharding
-
Locality sensitive hashing
-
k-d Trees
-
Hierarchical k-means
-
Cheaper retrieval methods
We’ll also cover the trade-offs related to each strategy and what they could be used for. For all the following examples, we assume that the user and items are represented by embedding vectors of the same size and that the affinity between the user and items is a simple dot product, cosine distance, or Euclidean distance. If we were to use a neural network like a two-tower model to score the user and ...
Get Building Recommendation Systems in Python and JAX now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.