How Criteo optimized and sped up its TensorFlow models by 10x and served them under 5 ms

by Nicolas Kowalski, Axel Antoniotti

Released February 2020

Publisher(s): O'Reilly Media, Inc.

ISBN: 0636920372523

Start your free trial

Video description

When you access a web page, bidders such as Criteo must determine in a few dozens of milliseconds if they want to purchase the advertising space on the page. At that moment, a real-time auction takes place, and once you remove all the communication exchange delays, it leaves a handful of milliseconds to compute exactly how much to bid. In the past year, Criteo has put a large amount of effort into reshaping its in-house machine learning stack responsible for making such predictions—in particular, opening it to new technologies such as TensorFlow.

Unfortunately, even for simple logistic regression models and small neural networks, Criteo’s initial TensorFlow implementations saw inference time increase by 100, going from 300 microseconds to 30 milliseconds.

Nicolas Kowalski and Axel Antoniotti outline how Criteo approached this issue, discussing how Criteo profiled its model to understand its bottleneck; why commonly shared solutions such as optimizing TensorFlow build for the target hardware, freezing and cleaning up the model, and using accelerated linear algebra (XLA) ended up being lackluster; and how Criteo rewrote is models from scratch, reimplementing cross-features and hashing functions using low-level TF operations in order to factorize as much as possible all TensorFlow nodes in its model.

Prerequisite knowledge

A basic understanding of how TensorFlow and TensorFlow Serving work
Experience optimizing TensorFlow models for serving (useful but not required)

What you'll learn

Understand how to optimize a TensorFlow model before serving it online
Discover how to profile a TensorFlow model with a complex preprocessing architecture
Learn how and when to replace feature columns with custom cross-features and hashing functions to factorize and drastically reduce the number of nodes in the model

This session is from the 2019 O'Reilly TensorFlow World Conference in Santa Clara, CA.

How Criteo optimized and sped up its TensorFlow models by 10x and served them under 5 ms - Nicolas Kowalski (Criteo), Axel Antoniotti (Criteo)

Product information

Title: How Criteo optimized and sped up its TensorFlow models by 10x and served them under 5 ms
Author(s): Nicolas Kowalski, Axel Antoniotti
Release date: February 2020
Publisher(s): O'Reilly Media, Inc.
ISBN: 0636920372523

video

TensorFlow model optimization: Quantization and pruning

by Raziel Alverez

Join Raziel Alverez (Google) to learn from TensorFlow performance experts who cover topics including optimization, quantization, …

video

Scaling TensorFlow using tf.distribute

by Taylor Robie, Priya Gupta

TensorFlow’s tf.distribute library helps you scale your model from a single GPU to multiple GPUs and …

video

Meet the Expert: Dean Wampler on Scaling ML/AI Applications with Ray

by Dean Wampler

Modern ML and AI applications require a lot of compute power, which usually means distribution over …

book

My New iPad

by Wallace Wang

My New iPad guides you through dozens of simple projects that will have you doing useful …

How Criteo optimized and sped up its TensorFlow models by 10x and served them under 5 ms

Video description

Table of contents

Product information

You might also like

TensorFlow model optimization: Quantization and pruning

Scaling TensorFlow using tf.distribute

Meet the Expert: Dean Wampler on Scaling ML/AI Applications with Ray

My New iPad

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly