Twitter's Real-Time Data Stack

Video description

This year, Twitter open sourced two powerful real-time analytics tools -- DistributedLog, a high-performance log service, and Heron, a distributed stream computation system.

A few weeks after Heron was open sourced, Karthik Ramasamy, engineering manager and technical lead for real-time analytics at Twitter, delivered a talk at Strata + Hadoop World in London to unveil the system and discuss:

  • An overview of Heron as a micro stream engine and its architectural components
  • How Twitter has been running Heron in production
  • The operational experience and challenges of running Heron at scale, including a discussion of stragglers
  • Heron's minimal resource usage and performance numbers

Leading up to Twitter's open sourcing of DistributedLog, software engineer and tech lead of the DistributedLog project Sijie Guo spoke at Strata + Hadoop World in San Jose to introduce the service. Key components of his talk include:

  • Why Twitter built DistributedLog
  • Technical decisions and challenges behind building DistributedLog
  • How Twitter uses DistributedLog to support different workloads
  • How Twitter runs the same software stack in multiple data centers to achieve global consistency

Publisher resources

View/Submit Errata

Product information

  • Title: Twitter's Real-Time Data Stack
  • Author(s): Nicole Tache
  • Release date: July 2016
  • Publisher(s): O'Reilly Media, Inc.
  • ISBN: 9781491969694