Kafka integration with Hadoop

Resource sharing, stability, availability, and scalability are a few of the many challenges of distributed computing. Nowadays, an additional challenge is to process extremely large volumes of data in TBs or PBs.

Introduction to Hadoop

Hadoop is a large-scale distributed batch processing framework which parallelizes data processing across many nodes and addresses the challenges for distributed computing, including big data.

Hadoop works on the principle of the MapReduce framework (introduced by Google), which provides a simple interface for the parallelization and distribution of large-scale computations. Hadoop has its own distributed data filesystem called HDFS (Hadoop Distributed File System). In any ...

Get Apache Kafka now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.