Appendix B. Cloudera’s Distribution for Hadoop

by Matt Massie and Todd Lipcon, Cloudera

Cloudera’s Distribution for Hadoop is based on the most recent stable version of Apache Hadoop with numerous patches, backports, and updates. Cloudera shares this distribution in a number of different formats: compressed tar files, RPMs, Debian packages, and Amazon EC2 AMIs. Cloudera’s Distribution for Hadoop is free, released under the Apache 2.0 license and available at

Cloudera has an online configurator at to make setting up a Hadoop cluster easy (Figure B-1). The configurator has a simple wizard-like interface that asks targeted questions about your cluster. When you’ve finished, the configurator generates customized Hadoop packages and places them in a package repository for you. You can manage any number of clusters and return at a later time to update your active configurations.

Cloudera’s on-line configurator makes it easy to set up a Hadoop cluster
Figure B-1. Cloudera’s on-line configurator makes it easy to set up a Hadoop cluster

To simplify package management, Cloudera shares RPMs from a yum repository and Debian packages from an apt repository. Cloudera’s Distribution for Hadoop enables you to install and configure Hadoop on each machine in your cluster by running a single, simple command. Kickstart users benefit even more by being able to commission entire Hadoop clusters automatically ...

Get Hadoop: The Definitive Guide now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.