Chapter 11. Twitter: When Everything Happens in Real Time

Twitter is all about real time. So, it needs infrastructure to handle data in real time, to do analytics, and provide insights in real time. As engineering manager for real-time compute at Twitter, I managed the infrastructures that facilitate the real-time analytics computation. My team managed all of the infrastructure and the tools needed to do this.

I came to Twitter in 2013 when Twitter acquired my company, which was doing real-time processing of GPS, or spatial, data. So, when Twitter wanted to have technology for processing real-time data, it acquired us, and we came to build its next-generation real-time platform.

Twitter Develops Heron

Called Heron, our real-time, distributed, and fault-tolerant stream-processing engine has powered all of Twitter’s real-time analytics since 2014. After it was deployed, incident reports dropped by an order of magnitude, demonstrating that Heron was both reliable and scalable. Because Heron is API-compatible with the popular Apache Storm, no code changes are required to migrate to Heron. Twitter open-sourced Heron in mid-2016.

When my team joined Twitter, the need for real-time analytics was growing exponentially. The company had one of the first implementations of the real-time processing system called Apache Storm, another popular software project that was open sourced by Twitter ...

Get Creating a Data-Driven Enterprise with DataOps now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.