Deep learning that’s easy to implement and easy to scale

The O’Reilly Data Show Podcast: Anima Anandkumar on MXNet, tensor computations and deep learning, and techniques for scaling algorithms.

By Ben Lorica
March 9, 2017
Cubes. Cubes. (source: Pixabay)

Deep learning that’s easy to implement and easy to scale
Data Show Podcast

00:00 / 00:36:41

In this episode of the Data Show, I spoke with Anima Anandkumar, a leading machine learning researcher, and currently a principal research scientist at Amazon. I took the opportunity to get an update on the latest developments on the use of tensors in machine learning. Most of our conversation centered around MXNet—an open source, efficient, scalable deep learning framework. I’ve been a fan of MXNet dating back to when it was a research project out of CMU and UW, and I wanted to hear Anandkumar’s perspective on its recent progress as a framework for enterprises and practicing data scientists.

Here are some highlights from our conversation:

Learn faster. Dig deeper. See farther.

Join the O'Reilly online learning platform. Get a free trial today and find answers on the fly, or master something new and useful.

Learn more

MXNet: An efficient, fast, and easy-to-use framework for deep learning

MXNet ships with many popular deep learning architectures that have been predefined, and optimized to a great degree. If you look at benchmarks, and I’ll be showing them at Strata, you get 90% efficiency on multiple GPUs, multiple instances. These scale up much better than the other packages. The idea is if you are enabling deep learning on the cloud, efficiency becomes a very important criterion and will result in huge cost savings to the customer.

In addition, MXNet is much easier to program in terms of giving users more flexibility. There are a range of different front-end languages the user can employ and still get the same performance. … For instance in addition to Python, you can code in R, or even Javascript if you want to run this on the browser.

… At the same time, there is also the mixed programming paradigm, which means you can have both declarative and imperative programming. The idea is you need declarative programming if you want to do optimizations because you need the computation graph to figure out how and where to do the optimizations. On the other hand, imperative programming is easier to write, easier to debug, easier for the programmer to think sequentially. Because both options are available, the user can decide what is best to suit their needs, and which part of the program will require optimization and which parts are amenable as imperative programs.

In the benchmarks that I’ll show, it’s not just about multiple GPUs on the same machine, but also multiple different instances. MXNet has parameter servers in the back end, which allows it to seamlessly distribute across either multiple GPUs or multiple machines.

Tensor computations, deep learning, and hardware

On one hand, if you think about the tensor operations, what we call tensor contractions are extensions of matrix products. And if you look into deep learning computations, they involve tensor contractions. It becomes very important, then, to ask if you can beyond the usual matrix computations and be able to efficiently parallelize along different hardware architectures. For instance, if you think about BLAS operations, the BLAS Level 1 are just scalar operations. BLAS Level 2 are matrix, vector operations. If you go to BLAS Level 3, you are looking at matrix, matrix operations. By going to higher level BLAS, you’re able to block operations together and get better efficiency. If you go to tensors, which are extensions of the matrices, you need the higher level BLAS operations.

In a recent paper, we defined such extensions to BLAS, which have been added to cuBLAS 8.0. To me, this is an exciting research area: how can we enable hardware optimizations for various tensor operations and how would that improve efficiency of deep learning and other machine learning algorithms?

Academia and industry

The opportunity here at AWS as a principal scientist has been a very timely and an exciting opportunity. I’ve been given a lot of freedom to explore and to push ahead and to make these algorithms available on the AWS cloud for everybody to use, and we’ll be pushing ahead with many more such capabilities. And at the same time, we’re also, in a way, doing research here and asking how we can think about new algorithms, how do we benchmark them with large-scale experiments, and talk about it at various conferences and other peer-reviewed venues. So, it’s definitely a mix of research and development here that excites me, and at the same time, I continue to advise students and continue to push the research agenda. Amazon is enabling me to do that and supporting me in that, so I see this as a joint partnership. I expect this to continue. I’ll be joining Caltech as an endowed chair, and I’m looking forward to more such engagements between industry and academia.

Related resources:

Post topics: AI & ML, Data, O'Reilly Data Show Podcast
Post tags: Podcast

Get the O’Reilly Radar Trends to Watch newsletter