The Trill data engine is the power behind many of Microsoft’s offerings, from products like Azure Stream Analytics to billion-dollar services like Bing Ads. It has now been open-sourced and is available to everyone. But it has been a long path to get there.
James Terwilliger, Badrish Chandramouli, and Jonathan Goldstein explore the history of decades of streaming data processing at Microsoft: a beginning in research, a first product in StreamInsight, the transition to the cloud, and all the pain points along the way. A key result of that lineage and learning has been the Trill engine, which has three key properties a single standalone data processing engine for all temporal data, no matter if the data is streamed or stored; a simple API that integrates seamlessly with the programming language; and performance without ego, a willingness to use every lesson learned to improve throughput in every way possible.
They dive deep into why each of those properties is important through examples. A simple application to demonstrate the basics of Trill: joins, aggregation, windowing; a more complicated application to demonstrate the power of Trill’s API: progressive windowing, regular expressions and pattern detection, data-dependent windows; and an overview of the kind of query used by Bing Ads, a query to run a multi-billion-dollar business.
You’ll see a performance showcase: running the previous examples to demonstrate how Trill got its name—processing a trillion events per day on a single node.
- A working knowledge of streaming data systems (useful but not required)
What you'll learn
- Learn about temporal data versus temporal queries and data-dependent and custom temporal windowing
This session is from the 2019 O'Reilly Strata Conference in New York, NY.
- Title: Trill: The crown jewel of Microsoft’s streaming pipeline explained
- Release date: February 2020
- Publisher(s): O'Reilly Media, Inc.
- ISBN: 0636920371922
You might also like
Stream processing beyond streaming data
Stream processing is becoming something like a “grand unifying paradigm” for data processing. Outgrowing its original …
Strata Data Conference 2019 - San Francisco, California
Thousands of the data scientists, analysts, engineers, developers, and executives converged at the Strata Data Conference …
O'Reilly Strata Data Conference 2019 - New York, New York
The 2019 Strata Data Conference NYC, the biggest Big Data conference in the world, was a …
O'Reilly Software Architecture Conference 2019 - San Jose, California
The O'Reilly Software Architecture Conference San Jose 2019 (SACON) gathered the world's leading software architects and …