This report is based on a series of conference talks I gave in 2014/15:
“Turning the database inside out with Apache Samza,” at Strange Loop, St. Louis, Missouri, US, 18 September 2014.
“Making sense of stream processing,” at /dev/winter, Cambridge, UK, 24 January 2015.
“Using logs to build a solid data infrastructure,” at Craft Conference, Budapest, Hungary, 24 April 2015.
“Systems that enable data agility: Lessons from LinkedIn,” at Strata + Hadoop World, London, UK, 6 May 2015.
“Change data capture: The magic wand we forgot,” at Berlin Buzzwords, Berlin, Germany, 2 June 2015.
“Samza and the Unix philosophy of distributed data,” at UK Hadoop Users Group, London, UK, 5 August 2015
Transcripts of those talks were previously published on the Confluent blog, and video recordings of some of the talks are available online. For this report, we have edited the content and brought it up to date. The images were drawn on an iPad, using the app “Paper” by FiftyThree, Inc.
Many people have provided valuable feedback on the original blog posts and on drafts of this report. In particular, I would like to thank Johan Allansson, Ewen Cheslack-Postava, Jason Gustafson, Peter van Hardenberg, Jeff Hartley, Pat Helland, Joe Hellerstein, Flavio Junqueira, Jay Kreps, Dmitry Minkovsky, Neha Narkhede, Michael Noll, James Nugent, Assaf Pinhasi, Gwen Shapira, and Greg Young for their feedback.
Thank you to LinkedIn for funding large portions of the open source development of Kafka and Samza, to Confluent for sponsoring this report and for moving the Kafka ecosystem forward, and to Ben Lorica and Shannon Cutt at O’Reilly for their support in creating this report.
—Martin Kleppmann, January 2016