Skip to Content
Building Recommendation Systems in Python and JAX
book

Building Recommendation Systems in Python and JAX

by Bryan Bischof, Hector Yee
December 2023
Intermediate to advanced content levelIntermediate to advanced
338 pages
8h 57m
English
O'Reilly Media, Inc.
Content preview from Building Recommendation Systems in Python and JAX

Chapter 6. Data Processing

In the trivial recommender that we defined in Chapter 1, we used the method get_availability; and in the MPIR, we used the method get_item_popularities. We hoped the choice of naming would provide sufficient context about their function, but we did not focus on the implementation details. Now we will start unpacking the details of some of this complexity and present the toolsets for online and offline collectors.

Hydrating Your System

Getting data into the pipeline is punnily referred to as hydration. The ML and data fields have a lot of water-themed naming conventions; “(Data ∩ Water) Terms” by Pardis Noorzad covers this topic.

PySpark

Spark is an extremely general computing library, with APIs for Java, Python, SQL, and Scala. PySpark’s role in many ML pipelines is for data processing and transforming the large-scale datasets.

Let’s return to the data structure we introduced for our recommendation problem; recall that the user-item matrix is the linear-algebraic representation of all the triples of users, items, and the user’s rating of the item. These triples are not naturally occurring in the wild. Most commonly, you begin with log files from your system; for example, Bookshop.org may have something that looks like this:

	'page_view_id': 'd15220a8e9a8e488162af3120b4396a9ca1',
	'anonymous_id': 'e455d516-3c08-4b6f-ab12-77f930e2661f',
	'view_tstamp': 2020-10-29 17:44:41+00:00,
	'page_url': 'https://bookshop.org/lists/best-sellers-of-the-week',
	'page_url_host' ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Python Testing with pytest

Python Testing with pytest

Brian Okken
Introduction to Python

Introduction to Python

Jessica McKellar

Publisher Resources

ISBN: 9781492097983Errata PageSupplemental Content