Appendix B. Cloudera’s Distribution Including Apache Hadoop
Cloudera’s Distribution Including Apache Hadoop (hereafter CDH) is an integrated Apache Hadoop-based stack containing all the components needed for production use, tested and packaged to work together. Cloudera makes the distribution available in a number of different formats: Linux packages, virtual machine images, tarballs, and scripts for running CDH in the cloud. CDH is free, released under the Apache 2.0 license and available at http://www.cloudera.com/hadoop/.
As of CDH3, the following components are included, many of which are covered elsewhere in this book:
HDFS: Self-healing distributed filesystem
MapReduce: Powerful, parallel data processing framework
Hadoop Common: A set of utilities that support the Hadoop subprojects
HBase: Hadoop database for random read/write access
Hive: SQL-like queries and tables on large datasets
Pig: Dataflow language and compiler
Oozie: Workflow for interdependent Hadoop jobs
Sqoop: Integrate databases and data warehouses with Hadoop
Flume: Highly reliable, configurable streaming data collection
ZooKeeper: Coordination service for distributed applications
Hue: User interface framework and SDK for visual Hadoop applications
Whirr: Libraries and scripts for running Hadoop and related services in the cloud
Mahout: Scalable machine-learning and data-mining algorithms
Cloudera also provides Cloudera Manager for deploying and operating Hadoop clusters running CDH.
To download CDH and Cloudera Manager, visit ...
Get Hadoop: The Definitive Guide, 3rd Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.