Introducing Spark andKafka | 119
focus, specically on batch processing. This over-specication led to an explosion of specialized
libraries, each attempting to solve a different problem. So, if you want to process streaming
data at scale, then you would have to use another complimentary library called Storm. Apache
Storm is a free and open source, scalable, fault-tolerant, distributed real-time computation sys-
tem. Storm makes it easy to reliably process unbounded streams of data, doing for real-time
processing what Hadoop does for batch processing. Again, you may nd it easier to query your
data using something like Hive.
So, along came Spark’s generalized abstractions for big data computing, bringing the big data
pipeline into one cohesive ...