The task of recommendation is one of the most intuitive. By studying people’s explicit preferences (through ratings) or implicit preferences (through observed behavior), you can make recommendations on what one user may like by drawing similarities between the user and other users, or between the products they liked and other products. Using the underlying similarities, recommendation engines can make new recommendations to other users.
Recommendation engines are one of the best use cases for big data. It’s fairly easy to collect training data about users’ past preferences at scale, and this data can be used in many domains to connect users with new content. Spark is an open source tool of choice used across a variety of companies for large-scale recommendations:
Amazon, Netflix, and HBO all want to provide relevant film and TV content to their users. Netflix utilizes Spark, to make large scale movie recommendations to their users.
A school might want to recommend courses to students by studying what courses similar students have liked or taken. Past enrollment data makes for a very easy to collect training dataset for this task.
In Spark, there is one workhorse recommendation algorithm, Alternating Least Squares (ALS). This algorithm leverages a technique called collaborative filtering, which makes recommendations based only on which items users interacted with in the past. That is, it does ...