Building Recommendation Systems in Python and JAX

Book description

Implementing and designing systems that make suggestions to users are among the most popular and essential machine learning applications available. Whether you want customers to find the most appealing items at your online store, videos to enrich and entertain them, or news they need to know, recommendation systems (RecSys) provide the way.

In this practical book, authors Bryan Bischof and Hector Yee illustrate the core concepts and examples to help you create a RecSys for any industry or scale. You'll learn the math, ideas, and implementation details you need to succeed. This book includes the RecSys platform components, relevant MLOps tools in your stack, plus code examples and helpful suggestions in PySpark, SparkSQL, FastAPI, and Weights & Biases.

You'll learn:

  • The data essential for building a RecSys
  • How to frame your data and business as a RecSys problem
  • Ways to evaluate models appropriate for your system
  • Methods to implement, train, test, and deploy the model you choose
  • Metrics you need to track to ensure your system is working as planned
  • How to improve your system as you learn more about your users, products, and business case

Publisher resources

View/Submit Errata

Table of contents

  1. Preface
    1. Conventions Used in This Book
    2. Using Code Examples
    3. O’Reilly Online Learning
    4. How to Contact Us
    5. Acknowledgments
  2. I. Warming Up
  3. 1. Introduction
    1. Key Components of a Recommendation System
      1. Collector
      2. Ranker
      3. Server
    2. Simplest Possible Recommenders
      1. The Trivial Recommender
      2. Most-Popular-Item Recommender
    3. A Gentle Introduction to JAX
      1. Basic Types, Initialization, and Immutability
      2. Indexing and Slicing
      3. Broadcasting
      4. Random Numbers
      5. Just-in-Time Compilation
    4. Summary
  4. 2. User-Item Ratings and Framing the Problem
    1. The User-Item Matrix
    2. User-User Versus Item-Item Collaborative Filtering
    3. The Netflix Challenge
    4. Soft Ratings
    5. Data Collection and User Logging
      1. What to Log
      2. Collection and Instrumentation
      3. Funnels
    6. Business Insight and What People Like
    7. Summary
  5. 3. Mathematical Considerations
    1. Zipf’s Laws in RecSys and the Matthew Effect
    2. Sparsity
    3. User Similarity for Collaborative Filtering
      1. Pearson Correlation
      2. Ratings via Similarity
    4. Explore-Exploit as a Recommendation System
      1. ϵ -greedy
      2. What Should ϵ Be?
    5. The NLP-RecSys Relationship
      1. Vector Search
      2. Nearest-Neighbors Search
    6. Summary
  6. 4. System Design for Recommending
    1. Online Versus Offline
    2. Collector
      1. Offline Collector
      2. Online Collector
    3. Ranker
      1. Offline Ranker
      2. Online Ranker
    4. Server
      1. Offline Server
      2. Online Server
    5. Summary
  7. 5. Putting It All Together: Content-Based Recommender
    1. Revision Control Software
    2. Python Build Systems
    3. Random-Item Recommender
    4. Obtaining the STL Dataset Images
    5. Convolutional Neural Network Definition
    6. Model Training in JAX, Flax, and Optax
    7. Input Pipeline
    8. Summary
  8. II. Retrieval
  9. 6. Data Processing
    1. Hydrating Your System
      1. PySpark
      2. Example: User Similarity in PySpark
      3. DataLoaders
      4. Database Snapshots
    2. Data Structures for Learning and Inference
      1. Vector Search
      2. Approximate Nearest Neighbors
      3. Bloom Filters
      4. Fun Aside: Bloom Filters as the Recommendation System
      5. Feature Stores
    3. Summary
  10. 7. Serving Models and Architectures
    1. Architectures by Recommendation Structure
      1. Item-to-User Recommendations
      2. Query-Based Recommendations
      3. Context-Based Recommendations
      4. Sequence-Based Recommendations
      5. Why Bother with Extra Features?
    2. Encoder Architectures and Cold Starting
    3. Deployment
      1. Models as APIs
      2. Spinning Up a Model Service
      3. Workflow Orchestration
    4. Alerting and Monitoring
      1. Schemas and Priors
      2. Integration Tests
      3. Observability
    5. Evaluation in Production
      1. Slow Feedback
      2. Model Metrics
    6. Continuous Training and Deployment
      1. Model Drift
      2. Deployment Topologies
    7. The Evaluation Flywheel
      1. Daily Warm Starts
      2. Lambda Architecture and Orchestration
      3. Logging
      4. Active Learning
    8. Summary
  11. 8. Putting It All Together: Data Processing and Counting Recommender
    1. Tech Stack
    2. Data Representation
    3. Big Data Frameworks
      1. Cluster Frameworks
      2. PySpark Example
    4. GloVE Model Definition
      1. GloVE Model Specification in JAX and Flax
      2. GloVE Model Training with Optax
      3. Summary
  12. III. Ranking
  13. 9. Feature-Based and Counting-Based Recommendations
    1. Bilinear Factor Models (Metric Learning)
    2. Feature-Based Warm Starting
    3. Segmentation Models and Hybrids
      1. Tag-Based Recommenders
      2. Hybridization
    4. Limitations of Bilinear Models
    5. Counting Recommenders
      1. Return to the Most-Popular-Item Recommender
      2. Correlation Mining
      3. Pointwise Mutual Information via Co-occurrences
      4. Similarity from Co-occurrence
      5. Similarity-Based Recommendations
    6. Summary
  14. 10. Low-Rank Methods
    1. Latent Spaces
    2. Dot Product Similarity
    3. Co-occurrence Models
    4. Reducing the Rank of a Recommender Problem
      1. Optimizing for MF with ALS
      2. Regularization for MF
      3. Regularized MF Implementation
      4. WSABIE
    5. Dimension Reduction
      1. Isometric Embeddings
      2. Nonlinear Locally Metrizable Embeddings
      3. Centered Kernel Alignment
    6. Affinity and p-sale
    7. Propensity Weighting for Recommendation System Evaluation
      1. Propensity
      2. Simpson’s and Mitigating Confounding
    8. Summary
  15. 11. Personalized Recommendation Metrics
    1. Environments
      1. Online and Offline
      2. User Versus Item Metrics
      3. A/B Testing
    2. Recall and Precision
      1. @ k
      2. Precision at k
      3. Recall at k
      4. R-precision
    3. mAP, MMR, NDCG
      1. mAP
      2. MRR
      3. NDCG
      4. mAP Versus NDCG?
      5. Correlation Coefficients
    4. RMSE from Affinity
    5. Integral Forms: AUC and cAUC
      1. Recommendation Probabilities to AUC-ROC
      2. Comparison to Other Metrics
    6. BPR
    7. Summary
  16. 12. Training for Ranking
    1. Where Does Ranking Fit in Recommender Systems?
    2. Learning to Rank
    3. Training an LTR Model
      1. Classification for Ranking
      2. Regression for Ranking
      3. Classification and Regression for Ranking
    4. WARP
    5. k-order Statistic
    6. BM25
    7. Multimodal Retrieval
    8. Summary
  17. 13. Putting It All Together: Experimenting and Ranking
    1. Experimentation Tips
      1. Keep It Simple
      2. Debug Print Statements
      3. Defer Optimization
      4. Keep Track of Changes
      5. Use Feature Engineering
      6. Understand Metrics Versus Business Metrics
      7. Perform Rapid Iteration
    2. Spotify Million Playlist Dataset
      1. Building URI Dictionaries
      2. Building the Training Data
      3. Reading the Input
      4. Modeling the Problem
      5. Framing the Loss Function
    3. Exercises
    4. Summary
  18. IV. Serving
  19. 14. Business Logic
    1. Hard Ranking
    2. Learned Avoids
    3. Hand-Tuned Weights
    4. Inventory Health
    5. Implementing Avoids
    6. Model-Based Avoids
    7. Summary
  20. 15. Bias in Recommendation Systems
    1. Diversification of Recommendations
      1. Improving Diversity
      2. Applying Portfolio Optimization
    2. Multiobjective Functions
    3. Predicate Pushdown
    4. Fairness
    5. Summary
  21. 16. Acceleration Structures
    1. Sharding
    2. Locality Sensitive Hashing
    3. k-d Trees
    4. Hierarchical k-means
    5. Cheaper Retrieval Methods
    6. Summary
  22. V. The Future of Recs
  23. 17. Sequential Recommenders
    1. Markov Chains
      1. Order-Two Markov Chain
      2. Other Markov Models
    2. RNN and CNN Architectures
    3. Attention Architectures
      1. Self-Attentive Sequential Recommendation
      2. BERT4Rec
      3. Recency Sampling
      4. Merging Static and Sequential
    4. Summary
  24. 18. What’s Next for Recs?
    1. Multimodal Recommendations
    2. Graph-Based Recommenders
      1. Neural Message Passing
      2. Applications
      3. Random Walks
      4. Metapath and Heterogeneity
    3. LLM Applications
      1. LLM Recommenders
      2. LLM Training
      3. Instruct Tuning for Recommendations
      4. LLM Rankers
      5. Recommendations for AI
    4. Summary
  25. Index
  26. About the Authors

Product information

  • Title: Building Recommendation Systems in Python and JAX
  • Author(s): Bryan Bischof, Hector Yee
  • Release date: December 2023
  • Publisher(s): O'Reilly Media, Inc.
  • ISBN: 9781492097990