Shrinking and accelerating deep neural networks

Song Han on compression techniques and inference engines to optimize deep learning in production.

By Roger Chen
April 13, 2017
GFAP neural storm. GFAP neural storm. (source: Jason Snyder on Flickr)

This is a highlight from a talk by Song Han, “Deep Neural Network Model Compression and an Efficient Inference Engine.” Visit Safari to view the full session from the 2016 Artificial Intelligence Conference in New York.

Deep neural networks have proven powerful for a variety of applications, but their sheer size places sobering constraints on speed, memory, and power consumption. These limitations become particularly important given the rise of mobile devices and their limited hardware resources.

Learn faster. Dig deeper. See farther.

Join the O'Reilly online learning platform. Get a free trial today and find answers on the fly, or master something new and useful.

Learn more

In this talk, Song Han shows how compression techniques can alleviate these challenges by greatly reducing the size of deep neural nets. He also demonstrates an energy-efficient engine that performs inference to greatly accelerate computation, making deep learning more practical as it spills from university campus to production.


Post topics: Artificial Intelligence