The Trill data engine is the power behind many of Microsoft’s offerings, from products like Azure Stream Analytics to billion-dollar services like Bing Ads. It has now been open-sourced and is available to everyone. But it has been a long path to get there.
James Terwilliger, Badrish Chandramouli, and Jonathan Goldstein explore the history of decades of streaming data processing at Microsoft: a beginning in research, a first product in StreamInsight, the transition to the cloud, and all the pain points along the way. A key result of that lineage and learning has been the Trill engine, which has three key properties a single standalone data processing engine for all temporal data, no matter if the data is streamed or stored; a simple API that integrates seamlessly with the programming language; and performance without ego, a willingness to use every lesson learned to improve throughput in every way possible.
They dive deep into why each of those properties is important through examples. A simple application to demonstrate the basics of Trill: joins, aggregation, windowing; a more complicated application to demonstrate the power of Trill’s API: progressive windowing, regular expressions and pattern detection, data-dependent windows; and an overview of the kind of query used by Bing Ads, a query to run a multi-billion-dollar business.
You’ll see a performance showcase: running the previous examples to demonstrate how Trill got its name—processing a trillion events per day on a single node.
- A working knowledge of streaming data systems (useful but not required)
What you'll learn
- Learn about temporal data versus temporal queries and data-dependent and custom temporal windowing
This session is from the 2019 O'Reilly Strata Conference in New York, NY.
- Title: Trill: The crown jewel of Microsoft’s streaming pipeline explained
- Release date: February 2020
- Publisher(s): O'Reilly Media, Inc.
- ISBN: 0636920371922
You might also like
Fast data with the KISSS stack
Streaming analytics (or fast data processing) is becoming an increasingly popular subject in enterprise organizations. Customers …
OSCON Open Source Software Superstream Series: Cloud Strategies and Implementation
Watch Part 1, OSCON Open Source Software Superstream Series: Live Coding—Go, Rust, and Python. Watch Part …
Orchestrating data workflows using a fully serverless architecture
Fundbox is a growing fintech company that provides an automatic underwriting platform based on data and …
O'Reilly Strata Data and AI Superstream
You’ll get access to O’Reilly data and AI experts. Deep dives into some of the hottest …