Apache Hadoop development is happening on multiple tracks. The releases of 2.X, 3.0.X, and 3.1.X were simultaneous. Hadoop 3.X was separated from Hadoop 2.x six years ago. We will look at major improvements in the latest releases: 3.X and 2.X. In Hadoop version 3.0, each area has seen a major overhaul, as can be seen in the following quick overview:
- HDFS benefited from the following:
- Erasure code
- Multiple secondary Name Node support
- Intra-Data Node Balancer
- Improvements to YARN include the following:
- Improved support for long-running services
- Docker support and isolation
- Enhancements in the Scheduler
- Application Timeline Service v.2
- A new User Interface for YARN
- YARN Federation
- MapReduce received ...