When two trends fuse: PyTorch and recommender systems
A look at the rise of the deep learning library PyTorch and simultaneous advancements in recommender systems.
In the last few years, we have experienced the resurgence of neural networks owing to availability of large data sets, increased computational power, innovation in model building via deep learning, and, most importantly, open source software libraries that ease use for non-researchers. In 2016, the rapid rise of the TensorFlow library for building deep learning models allowed application developers to take state-of-the-art models and put them into production. Deep learning-based neural network research and application development is currently a very fast moving field. As such, in 2017 we have seen the emergence of the deep learning library PyTorch. At the same time, researchers in the field of recommendation systems continue to pioneer new ways to increase performance as the number of users and items increases. In this post, we will discuss the rise of PyTorch, and how its flexibility and native Python integration make it an ideal tool for building recommender systems.
Differentiating PyTorch and TensorFlow
The commonalities between TensorFlow and PyTorch stop at both being general purpose analytic problem-solving libraries and both using the Python language as its primary interface. PyTorch roots are in dynamic libraries such as Chainer, where execution of operations in a computation graph takes place immediately. This is in contrast to TensorFlow-style design, where the computation graph is compiled and executed as a whole. (Note: Recently, TensorFlow has added Eager mode, which allows dynamic execution of computation graphs.)
The rise of PyTorch
PyTorch was created to address challenges in the adoption of its predecessor library, Torch. Due to the low popularity and general unwillingness among users to learn the programming language Lua, Torch—a mainstay in computer vision for several years—never saw the explosive growth of TensorFlow. Both Torch and PyTorch are primarily developed, managed, and maintained by the team at Facebook AI Research (FAIR). PyTorch has seen a rise in adoption due to native Python-style imperative programming already familiar to researchers, data scientists, and developers of popular Python libraries such as NumPy and SciPy. This imperative flexible approach to building deep learning models allows for easier debugging compared to a compiled model. Whereas in a compiled model errors will not be detected until the computation graph is submitted for execution, in a Define-by-Run-style PyTorch model, errors can be detected and debugging can be done as models are defined. This flexible approach is notably important for building models where the model architecture can change based on input. Researchers focusing on recurrent neural network (RNN)-based approaches for solving language understanding, translation, and other variable sequence-based problems have found a particular liking to this Define-by-Run approach. Lastly, the built-in automatic differentiation feature in PyTorch allows model builders an easy way to perform the error-reducing back propagation step.
Late in the summer of 2017, with release 0.2.0, PyTorch achieved a significant milestone by adding distributed training of deep learning models, a common necessity to reduce model training time when working with large data sets. Furthermore, the ability to translate PyTorch models to Caffe 2 (another library from FAIR) was added via the Open Neural Network Exchange (ONNX). ONNX allows those struggling to put PyTorch into production to generate an intermediate representation of the model that can be transferred to Caffe 2 library for deployment from servers to mobile devices. Certainly, using ONNX one can also transfer PyTorch models to other participating libraries.
Recently, we have seen further validation of PyTorch’s rise with problem-solving approaches built on top of the library. The engineering team at Uber, the popular ride sharing company, has built Pyro, a universal probabilistic programming language using PyTorch as its back end. The decision to use PyTorch was driven by the ability to perform native automatic differentiation and construct gradients dynamically, which is necessary for random operations in a probabilistic model. Another development of note was when the popular deep learning training site fast.ai announced it was switching future course content to be based on PyTorch rather than Keras-TensorFlow. In addition to the core PyTorch features, the fast.ai team noted the use of PyTorch by a majority of the top scorers in Kaggle’s “Understanding the Amazon from space” challenge, which uses satellite data to track the human footprint in the Amazon rainforest. The fast pace of adoption and extensibility is making PyTorch a library of choice for researches and application developers alike.
Another explosive trend: Recommender systems
Just as software libraries for deep learning are growing, the growth of user-generated content and user behavior signals has been an explosive trend in the last decade. In our vast ocean of consumption choices, the need for improved methods of curation has become ever important. For the last few decades, recommender systems have led the way in tailoring user experience to align user interests with the correct product, content, or action. With growing numbers of users and items, the ability to perform simple deductive recommendation (wine-cheese-crackers, James Bond-Mission Impossible-Jason Bourne Movies) has become challenging. Techniques such as memory-based collaborative filtering, which uses similarity based measures to perform recommendation, do not perform once user and item data becomes sparse, as is the case with most content and product applications. Take for example, a small system with 100K users and 10K items. It is unlikely every user has experienced/purchased/rated more than 100 items. The resulting user-item matrix will be extremely sparse, making it difficult to provide valid recommendations that are not purely random guesses.
An alternate approach to user-item-based distance measurement is to learn the underlying relationships between users and items to build a predictive model for recommendation. For example, let’s say our goal is to recommend movies to a user that are predicted to be rated 4+ stars out of 5. In this rule-learning-based approach, data scientists typically divide the historical preference data into train, test, and validation sets, as one would in supervised learning models. Commonly used rule-learning techniques such as alternating least squares and support vector machines have been state-of-the-art in the prior decade. Among the many recent advances in recommender systems, there have been two key concepts that help solve the challenges faced in large-scale systems: Wide & Deep Learning for Recommender Systems (by a team at Google), and deep matrix factorization (about which several papers have been written by other researchers).
The core idea behind Wide & Deep Learning is to jointly train both the wide and the deep networks. In our example, a wide network is used to learn the underlying rule that would generate a high rating for a recommendation request and item pair. Meanwhile, the sparse user behavior vectors are mapped to a dense representation using a state-of-the-art feature-vector transformation model (for example, word2vec). A deep neural network is trained using these dense vectors as input with targeted rating as output. This approach was put into production in the Google Play Store for Mobile App recommendation using the champion-challenger deployment model. The deep matrix factorization concept attempts to learn the non-linear relationships between users and items. This model is implemented by using user-item pair as input to the neural network with the predicted rating as the output.
It naturally follows that the fast-rising PyTorch library should be used to test these new approaches for recommender systems. In March 2018 at the Strata Data Conference in San Jose, we will do exactly that in a tutorial format. We will use the popular Movie Lens data set to build traditional, Wide & Deep, and deep matrix factorization models for recommendation.