Video description
The Trill data engine is the power behind many of Microsoft’s offerings, from products like Azure Stream Analytics to billion-dollar services like Bing Ads. It has now been open-sourced and is available to everyone. But it has been a long path to get there.
James Terwilliger, Badrish Chandramouli, and Jonathan Goldstein explore the history of decades of streaming data processing at Microsoft: a beginning in research, a first product in StreamInsight, the transition to the cloud, and all the pain points along the way. A key result of that lineage and learning has been the Trill engine, which has three key properties a single standalone data processing engine for all temporal data, no matter if the data is streamed or stored; a simple API that integrates seamlessly with the programming language; and performance without ego, a willingness to use every lesson learned to improve throughput in every way possible.
They dive deep into why each of those properties is important through examples. A simple application to demonstrate the basics of Trill: joins, aggregation, windowing; a more complicated application to demonstrate the power of Trill’s API: progressive windowing, regular expressions and pattern detection, data-dependent windows; and an overview of the kind of query used by Bing Ads, a query to run a multi-billion-dollar business.
You’ll see a performance showcase: running the previous examples to demonstrate how Trill got its name—processing a trillion events per day on a single node.
Prerequisite knowledge
- A working knowledge of streaming data systems (useful but not required)
What you'll learn
- Learn about temporal data versus temporal queries and data-dependent and custom temporal windowing
This session is from the 2019 O'Reilly Strata Conference in New York, NY.
Product information
- Title: Trill: The crown jewel of Microsoft’s streaming pipeline explained
- Author(s):
- Release date: February 2020
- Publisher(s): O'Reilly Media, Inc.
- ISBN: 0636920371922
You might also like
video
Fast data with the KISSS stack
Streaming analytics (or fast data processing) is becoming an increasingly popular subject in enterprise organizations. Customers …
video
Creating an extensible 100+ PB real-time big data platform by unifying storage and serving
Uber relies heavily on making data-driven decisions in every product area and needs to store and …
video
Orchestrating data workflows using a fully serverless architecture
Fundbox is a growing fintech company that provides an automatic underwriting platform based on data and …
book
Streaming Integration
Data is being generated at an unrelenting pace, and data storage capacity can’t keep up. Enterprises …