book

Building Recommendation Systems in Python and JAX

by Bryan Bischof, Hector Yee

December 2023

Intermediate to advanced

338 pages

8h 57m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Preface
Conventions Used in This BookUsing Code ExamplesO’Reilly Online LearningHow to Contact UsAcknowledgments
I. Warming Up
1. Introduction
Key Components of a Recommendation SystemCollectorRankerServerSimplest Possible RecommendersThe Trivial RecommenderMost-Popular-Item RecommenderA Gentle Introduction to JAXBasic Types, Initialization, and ImmutabilityIndexing and SlicingBroadcastingRandom NumbersJust-in-Time CompilationSummary
2. User-Item Ratings and Framing the Problem
The User-Item MatrixUser-User Versus Item-Item Collaborative FilteringThe Netflix ChallengeSoft RatingsData Collection and User LoggingWhat to LogCollection and InstrumentationFunnelsBusiness Insight and What People LikeSummary
3. Mathematical Considerations
Zipf’s Laws in RecSys and the Matthew EffectSparsityUser Similarity for Collaborative FilteringPearson CorrelationRatings via SimilarityExplore-Exploit as a Recommendation System ϵ -greedyWhat Should ϵ Be?The NLP-RecSys RelationshipVector SearchNearest-Neighbors SearchSummary
4. System Design for Recommending
Online Versus OfflineCollectorOffline CollectorOnline CollectorRankerOffline RankerOnline RankerServerOffline ServerOnline ServerSummary
5. Putting It All Together: Content-Based Recommender
Revision Control SoftwarePython Build SystemsRandom-Item RecommenderObtaining the STL Dataset ImagesConvolutional Neural Network DefinitionModel Training in JAX, Flax, and OptaxInput PipelineSummary
II. Retrieval
6. Data Processing
Hydrating Your SystemPySparkExample: User Similarity in PySparkDataLoadersDatabase SnapshotsData Structures for Learning and InferenceVector SearchApproximate Nearest NeighborsBloom FiltersFun Aside: Bloom Filters as the Recommendation SystemFeature StoresSummary
7. Serving Models and Architectures
Architectures by Recommendation StructureItem-to-User RecommendationsQuery-Based RecommendationsContext-Based RecommendationsSequence-Based RecommendationsWhy Bother with Extra Features?Encoder Architectures and Cold StartingDeploymentModels as APIsSpinning Up a Model ServiceWorkflow OrchestrationAlerting and MonitoringSchemas and PriorsIntegration TestsObservabilityEvaluation in ProductionSlow FeedbackModel MetricsContinuous Training and DeploymentModel DriftDeployment TopologiesThe Evaluation FlywheelDaily Warm StartsLambda Architecture and OrchestrationLoggingActive LearningSummary

8. Putting It All Together: Data Processing and Counting Recommender
Tech StackData RepresentationBig Data FrameworksCluster FrameworksPySpark ExampleGloVE Model DefinitionGloVE Model Specification in JAX and FlaxGloVE Model Training with OptaxSummary
III. Ranking
9. Feature-Based and Counting-Based Recommendations
Bilinear Factor Models (Metric Learning)Feature-Based Warm StartingSegmentation Models and HybridsTag-Based RecommendersHybridizationLimitations of Bilinear ModelsCounting RecommendersReturn to the Most-Popular-Item RecommenderCorrelation MiningPointwise Mutual Information via Co-occurrencesSimilarity from Co-occurrenceSimilarity-Based RecommendationsSummary
10. Low-Rank Methods
Latent SpacesDot Product SimilarityCo-occurrence ModelsReducing the Rank of a Recommender ProblemOptimizing for MF with ALSRegularization for MFRegularized MF ImplementationWSABIEDimension ReductionIsometric EmbeddingsNonlinear Locally Metrizable EmbeddingsCentered Kernel AlignmentAffinity and p-salePropensity Weighting for Recommendation System EvaluationPropensitySimpson’s and Mitigating ConfoundingSummary
11. Personalized Recommendation Metrics
EnvironmentsOnline and OfflineUser Versus Item MetricsA/B TestingRecall and Precision@ kPrecision at kRecall at kR-precisionmAP, MMR, NDCGmAPMRRNDCGmAP Versus NDCG?Correlation CoefficientsRMSE from AffinityIntegral Forms: AUC and cAUCRecommendation Probabilities to AUC-ROCComparison to Other MetricsBPRSummary
12. Training for Ranking
Where Does Ranking Fit in Recommender Systems?Learning to RankTraining an LTR ModelClassification for RankingRegression for RankingClassification and Regression for RankingWARPk-order StatisticBM25Multimodal RetrievalSummary
13. Putting It All Together: Experimenting and Ranking
Experimentation TipsKeep It SimpleDebug Print StatementsDefer OptimizationKeep Track of ChangesUse Feature EngineeringUnderstand Metrics Versus Business MetricsPerform Rapid IterationSpotify Million Playlist DatasetBuilding URI DictionariesBuilding the Training DataReading the InputModeling the ProblemFraming the Loss FunctionExercisesSummary
IV. Serving
14. Business Logic
Hard RankingLearned AvoidsHand-Tuned WeightsInventory HealthImplementing AvoidsModel-Based AvoidsSummary
15. Bias in Recommendation Systems
Diversification of RecommendationsImproving DiversityApplying Portfolio OptimizationMultiobjective FunctionsPredicate PushdownFairnessSummary
16. Acceleration Structures
ShardingLocality Sensitive Hashingk-d TreesHierarchical k-meansCheaper Retrieval MethodsSummary
V. The Future of Recs
17. Sequential Recommenders
Markov ChainsOrder-Two Markov ChainOther Markov ModelsRNN and CNN ArchitecturesAttention ArchitecturesSelf-Attentive Sequential RecommendationBERT4RecRecency SamplingMerging Static and SequentialSummary
18. What’s Next for Recs?
Multimodal RecommendationsGraph-Based RecommendersNeural Message PassingApplicationsRandom WalksMetapath and HeterogeneityLLM ApplicationsLLM RecommendersLLM TrainingInstruct Tuning for RecommendationsLLM RankersRecommendations for AISummary
Index
About the Authors

Content preview from Building Recommendation Systems in Python and JAX

Part II. Retrieval

How do we get all the data in the right place to train a recommendation system? How do we build and deploy systems for real-time inference?

Reading research papers about recommendation systems will often give the impression that they’re built via a bunch of math equations, and all the really hard work of using recommendation systems is in connecting these equations to the features of your problem. More realistically, the first several steps of building a production recommendation system fall under systems engineering. Understanding how your data will make it into your system, be manipulated into the correct structure, and then be available in each of the relevant steps of the training flow often constitutes the bulk of the initial recommendation system’s work. But even beyond this initial phase, ensuring that all the necessary components are fast enough and robust enough for production environments requires yet another significant investment in platform infrastructure.

Often you’ll build a component responsible for processing the various types of data and storing them in a convenient format. Next, you’ll construct a model that takes that data and encodes it in a latent space or other representation model. Finally, you’ll need to transform an input request into the representation as a query in this space. These steps usually take the form of jobs in a workflow management platform or services deployed as endpoints. The next few chapters will walk you through ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9781492097983Errata Page

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Building Recommendation Systems in Python and JAX

by Bryan Bischof, Hector Yee

Part II. Retrieval

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.