O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Building a Recommendation Engine with Scala

Book Description

Learn to use Scala to build a recommendation engine from scratch and empower your website users

About This Book

  • Learn the basics of a recommendation engine and its application in e-commerce
  • Discover the tools and machine learning methods required to build a recommendation engine
  • Explore different kinds of recommendation engines using Scala libraries such as MLib and Spark

Who This Book Is For

This book is written for those who want to learn the different tools in the Scala ecosystem to build a recommendation engine. No prior knowledge of Scala or recommendation engines is assumed.

What You Will Learn

  • Discover the tools in the Scala ecosystem
  • Understand the challenges faced in e-commerce systems and learn how you can solve those challenges with a recommendation engine
  • Familiarise yourself with machine learning algorithms provided by the Apache Spark framework
  • Build different versions of recommendation engines from practical code examples
  • Enhance the user experience by learning from user feedback
  • Dive into the various techniques of recommender systems such as collaborative, content-based, and cross-recommendations

In Detail

With an increase of data in online e-commerce systems, the challenges in assisting users with narrowing down their search have grown dramatically. The various tools available in the Scala ecosystem enable developers to build a processing pipeline to meet those challenges and create a recommendation system to accelerate business growth and leverage brand advocacy for your clients.

This book provides you with the Scala knowledge you need to build a recommendation engine.

You'll be introduced to Scala and other related tools to set the stage for the project and familiarise yourself with the different stages in the data processing pipeline, including at which stages you can leverage the power of Scala and related tools. You'll also discover different machine learning algorithms using MLLib.

As the book progresses, you will gain detailed knowledge of what constitutes a collaborative filtering based recommendation and explore different methods to improve users’ recommendation.

Style and approach

A step-by-step guide full of real-world, hands-on examples of Scala recommendation engines. Each example is placed in context with explanation and visuals.

Downloading the example code for this book. You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. If you purchased this book elsewhere, you can visit http://www.PacktPub.com/support and register to have the code file.

Table of Contents

  1. Building a Recommendation Engine with Scala
    1. Table of Contents
    2. Building a Recommendation Engine with Scala
    3. Credits
    4. About the Author
    5. About the Reviewers
    6. www.PacktPub.com
      1. Support files, eBooks, discount offers, and more
        1. Why subscribe?
        2. Free access for Packt account holders
    7. Preface
      1. What this book covers
      2. What you need for this book
      3. Who this book is for
      4. Conventions
      5. Reader feedback
      6. Customer support
        1. Downloading the example code
        2. Errata
        3. Piracy
        4. Questions
    8. 1. Introduction to Scala and Machine Learning
      1. Setting up Scala, SBT, and Apache Spark
      2. A quick introduction to Scala
        1. Case classes
        2. Tuples
        3. Scala REPL
        4. SBT – Scala Build Tool
        5. Apache Spark
          1. Setting up a standalone Apache Spark cluster
          2. Apache Spark – MLlib
      3. Machine learning and recommendation engines
      4. Summary
    9. 2. Data Processing Pipeline Using Scala
      1. Entree – a sample dataset for recommendation systems
        1. Data analysis of the Entree dataset
      2. ETL – extract transform load
        1. Extract
        2. Transform
        3. Load
      3. Extraction and transformation for machine learning
        1. Types of data
          1. Discrete
          2. Continuous
          3. Categorical
        2. Cleaning the data
          1. Missing data
          2. Normalization
          3. Standardization
      4. Setting up MongoDB and Apache Kafka
        1. Setting up MongoDB
        2. Setting up Apache Kafka
      5. Data processing pipeline for Entree
        1. How does it relate to information retrieval?
      6. Summary
    10. 3. Conceptualizing an E-Commerce Store
      1. Importance of recommender systems in e-commerce
        1. Converting browsers into buyers
        2. Making cross-sell happen
        3. Increased loyalty time
      2. Types of recommendation methods
        1. Frequently bought together
        2. An example of frequent patterns
        3. People to people correlation
        4. Customer reviews and ratings
        5. People who were also interested in other similar items
        6. Recommendation from others' views
        7. Example of similar items
          1. Manual
          2. Automatic
          3. Ephemeral
          4. Persistent
      3. The architecture of the project
        1. Batch versus online
      4. Summary
    11. 4. Machine Learning Algorithms
      1. Hands on with Spark/MLlib
      2. Data types
        1. Vector
        2. Matrix
        3. Labeled point
      3. Statistics
        1. Summary statistics
        2. Correlation
        3. Sampling
        4. Hypothesis testing
        5. Random data generation
      4. Feature extraction and transformation
        1. Term frequency-inverted document frequency (TF-IDF)
        2. Word2Vec
        3. StandardScaler
        4. Normalizer
        5. Feature selection
        6. Dimensionality reduction
      5. Classification/regression
        1. Linear methods
        2. Naive Bayes
        3. Decision trees
        4. Ensembles
      6. Clustering
        1. K-Means
        2. Expectation-maximization
        3. Power iteration clustering
        4. Latent Dirichlet Allocation
          1. LDA example
      7. Association analysis
        1. Frequent pattern mining (FPGrowth)
      8. Summary
    12. 5. Recommendation Engines and Where They Fit in?
      1. Populating the Amazon dataset
      2. Creating a web app with user/product pages
        1. Creating a Play framework application
        2. The home page
        3. Product Groups
        4. Product view
        5. Customer views
      3. Adding recommendation pages
        1. The Top Rated view
        2. The Most Popular view
        3. The Monthly Trends view
      4. Summary
    13. 6. Collaborative Filtering versus Content-Based Recommendation Engines
      1. Content-based recommendation
        1. Similarity measures
          1. Pearson correlation
            1. Challenges with Pearson correlation
          2. Euclidean distance
            1. Challenges with Euclidean distance
          3. Cosine measure
          4. Spearman correlation
          5. Tanimoto coefficient
          6. Log likelihood test
      2. Content-based recommendation steps
        1. Clustering for performance
      3. Collaborative filtering based recommendation
      4. What is ALS?
        1. ALS in Apache Spark
        2. ALS on Amazon ratings
      5. Content-based versus collaborative filtering
      6. Summary
    14. 7. Enhancing the User Experience
      1. Adding product search
        1. Setting up Elasticsearch
      2. Adding recommendation listings
      3. Understanding recommendation behavior
        1. Why is that so?
        2. Logging
        3. Ranking
        4. Diversification
        5. Justification
        6. Evaluation
      4. Summary
    15. 8. Learning from User Feedback
      1. Introducing PredictionIO
        1. Installing PredictionIO
      2. Unified recommender
      3. Summary
    16. Index