Chapter 8. Putting It All Together: Data Processing and Counting Recommender

Now that we have discussed the broad outline of recommender systems, this chapter will put it into a concrete implementation so that we can talk about the choices of technologies and specifics of how the implementation works in real life.

This chapter covers the following topics:

  • Data representation with protocol buffers

  • Data processing frameworks

  • A PySpark sample program

  • GloVE embedding model

  • Additional foundational techniques in JAX, Flax, and Optax

We will show step-by-step how to go from a downloaded Wikipedia dataset to a recommender system that can recommend words from Wikipedia based on the co-occurrence with words in a Wikipedia article. We use a natural language example because words are easily understood, and their relationships are readily grasped because we can see that related words occur near one another in a sentence. Furthermore, the Wikipedia corpus is easily downloadable and browsable by anyone with an internet connection. This idea of co-occurrence can be generalized to any co-occurring collection of items, such as watching a video in the same session or purchasing cheeses in the same shopping bag.

This chapter will demonstrate concrete implementations of an item-item and a feature-item recommender. Items in this case are the words in an article, and the features are word-count similarity—a MinHash or a kind of locality sensitive hash for words. Chapter 16 covers locality ...

Get Building Recommendation Systems in Python and JAX now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.