Understanding Hadoop's Ecosystem

Hadoop is often used for historical data analytics, although a new trend is emerging where it is used for real-time data streaming as well. Considering the offerings of Hadoop's ecosystem, we have broadly categorized them into the following categories:

  • Data flow: This includes components that can transfer data to and from different subsystems to and from Hadoop including real-time, batch, micro-batching, and event-driven data processing.
  • Data engine and frameworks: This provides programming capabilities on top of Hadoop YARN or MapReduce.
  • Data storage: This category covers all types of data storage on top of HDFS.
  • Machine learning and analytics: This category covers big data analytics and machine learning ...

Get Apache Hadoop 3 Quick Start Guide now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.