Skip to Content
Data Analytics with Hadoop
book

Data Analytics with Hadoop

by Benjamin Bengfort, Jenny Kim
June 2016
Intermediate to advanced
286 pages
8h 9m
English
O'Reilly Media, Inc.
Content preview from Data Analytics with Hadoop

Appendix A. Creating a Hadoop Pseudo-Distributed Development Environment

In order to execute the code in this book, you’ll need to set up a development environment. Hadoop developers usually test their scripts and code on a pseudo-distributed environment (also known as a single node setup), which is a virtual machine that runs all of the Hadoop daemons simultaneously on a single machine.

These instructions will help you install a pseudo-distributed environment with Hadoop 2.5.0 on Ubuntu 14.04.

Quick Start

There are a couple of options if you are not familiar with systems administration on Linux, or do not wish to work through the process of installing Hadoop yourself. We have provided a VMDK for you to use in the virtualization software of your choice (e.g., VirtualBox or VMWare Fusion). Alternatively, both Hortonworks and Cloudera supply virtual machines for quick download.

To get up and started quickly, simply download the VM and run it in your favorite virtualization software. Be aware that if you do use Cloudera or Hortonworks distributions, the environment may be subtly different than the one we use. To get everything set up, either download the preconfigured machine or follow the steps described here.

If you are using the VMDK supplied by us, to log in to the machine use the username and password as follows:

username: student
password: password

If you’re brave enough to set up the environment yourself, go ahead and move to the next section!

Setting Up Linux

Before you ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Big Data Analytics with Hadoop 3

Big Data Analytics with Hadoop 3

Sridhar Alla
Hadoop Fundamentals for Data Scientists

Hadoop Fundamentals for Data Scientists

Jenny Kim, Benjamin Bengfort
Data Science on AWS

Data Science on AWS

Chris Fregly, Antje Barth

Publisher Resources

ISBN: 9781491913734Supplemental ContentErrata Page