Chapter 5. Installation and Configuration

Installing Hadoop

Once you’ve prepped the environment, selected the version and distribution of Hadoop, and decided which daemons will run where, you’re ready to install Hadoop. The act of installing Hadoop is relatively simple once the machines have been properly prepared. There are almost endless ways of installing Hadoop. The goal here, though, is to define some best practices for deployment to avoid the most common mistakes and pain points.

In all deployment scenarios, there are a few common tasks. Hadoop is always downloaded and installed in a select location on the filesystem. For tarball-based installs, this leaves quite a bit of flexibility but also an equal amount of ambiguity. Tarball installs are also complicated because the administrator needs to perform extra steps to create system users, relocate log and pid file directories, set permissions appropriately, and so forth. If you’re not sure which method of install to perform, start with RPM or Deb packages. It will save you from making common mistakes and keep you in line with best practices developed by the Hadoop community over time.

Get Hadoop Operations now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.