Building and deploying large-scale machine learning applications
The O’Reilly Data Show Podcast: Danny Bickson on recommenders, data science, and applications of machine learning.
In this episode of the O’Reilly Data Show, I spoke with Danny Bickson, co-founder and VP at Dato, and the principal organizer of the Data Science Summit (full disclosure: I’m a member of the conference organizing committee). Among machine learning students and practitioners, recommender systems have become somewhat of a canonical use case and application. One of the early and popular building blocks was GraphLab’s collaborative filtering toolkit, a library originally written and maintained by Bickson. He has continued to keep tabs on the latest developments in recommenders and continues to help organize workshops on related topics throughout the world.
In recent years Bickson has turned his focus toward helping companies deploy machine learning systems in production in a wide range of real-world settings. Here are some highlights from our conversation:
Building a toolkit for collaborative filtering
It was kind of accidental. I was working on my Ph.D.—a lot of linear models, like linear systems of equations and interactive solvers. Matrix factorization, which is the base algorithm behind collaborative filtering, is very related to linear systems. It can be thought of as some kind of extension, and it’s more powerful. … When I was at CMU, I heard a lecture by a guy who’s now a researcher at Facebook, who actually worked on what they call Bayesian Tensor Factorization. This work drew me toward the domain, and I started to look into it. His code was in Matlab, so I tried to re-implement it on our system, GraphLab.
Initially, when we started the project, we had what we call a framework, which is like an API for graph analytics. But we found out that not many people are interested in just writing code for a framework because it’s a very low level and it’s not that intuitive. … Once we started to package algorithms on top of the framework, then we became way more popular because people wanted to use pre-made building blocks.
One of the reasons behind the success of this toolkit was that we started to compete in what was a relatively known competition called ACM KDD Cup. It was back in 2011. … When we started to compete using our code, we actually did something that was counter-intuitive: we shared our code during the competition, and then people, if they downloaded it, could improve their own results. That got us very quickly to hundreds of downloads, and a lot of companies were involved in this competition, so that opened a lot of doors for us in industry.
The pillar stone of recommender systems research started with the Netflix competition, which, I guess, most of us know. … That was for movie recommenders. Their main assumptions were that you echoed information about user-to-movie interaction and their scores. That’s a kind of program that we are all very familiar with. There are hundreds of research papers. It is an explored domain where we are very good.
The areas that need a bit more attention are those where you have additional data. … [where] you also know the day of the week, and the time, and which type of iPhone the user had, and what the user’s age and zip code are, what the item color and price is, and so on. Once you throw in more information, of course you can build richer models, but then the complexity goes up.
… You can have models that rely on user behavior. You can have separate models that rely on activity data, like finding similar cars to the ones previously sold, and so on. There are models based on text description of products. We have models based on user reviews of products, text reviews, and sentiments. There are models that even take into account images of products. But the most interesting models are hybrid models that combine a lot of types of inputs because companies have very rich information. Currently, they’re using just a small fraction of that information to make the predictions, but once they gather more information they can have better models and more accurate models. That is what’s most interesting to me personally.
Deep learning using Dato
As you know, deep learning is one of the hottest techniques in machine learning, so we did want to have a foot in this domain. So far, we have an initial version, which supports convolutional neural networks. But the good news is that we hired some of the people behind MXNet, which is one of the emerging deep learning platforms, and there you have a lot of other algorithms, including RNN, and you also have features like support of multiple GPUs.
- The evolution of GraphLab (a previous episode of the Data Show featuring Dato co-founder/CEO Carlos Guestrin)
- Practical machine learning techniques for building intelligent applications (a previous episode of the Data Show featuring Mikio Braun, data scientist at Zalando)
- Stream processing and messaging systems for the IoT age (a previous episode of the Data Show featuring MC Srivas, co-founder of MapR)
- Machine Learning – an O’Reilly Learning Path