Chapter 10. The Evolution of Large-Scale Data Processing

You have now arrived at the final chapter in the book, you stoic literate, you. Your journey will soon be complete!

To wrap things up, I’d like you to join me on a brief stroll through history, starting back in the ancient days of large-scale data processing with MapReduce and touching upon some of the highlights over the ensuing decade and a half that have brought streaming systems to the point they’re at today. It’s a relatively lightweight chapter in which I make a few observations about important contributions from a number of well-known systems (and a couple maybe not-so-well known), refer you to a bunch of source material you can go read on your own should you want to learn more, all while attempting not to offend or inflame the folks responsible for systems whose truly impactful contributions I’m going to either oversimplify or ignore completely for the sake of space, focus, and a cohesive narrative. Should be a good time.

On that note, keep in mind as you read this chapter that we’re really just talking about specific pieces of the MapReduce/Hadoop family tree of large-scale data processing here. I’m not covering the SQL arena in any way shape or form1; we’re not talking HPC/supercomputers, and so on. So as broad and expansive as the title of this chapter might sound, I’m really focusing on a specific vertical swath of the grand universe of large-scale data processing. Caveat literatus, and all that.

Also note that ...

Get Streaming Systems now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.