Getting Apache Hadoop

The official page for Apache Hadoop is http://hadoop.apache.org. Here, you can find in-depth documentation, manuals, and releases of Apache Hadoop. Hadoop is written in Java and requires JVM installed on your single-node setup to run. It is supported on both GNU/Linux and Windows.

Since the purpose of this chapter is to get introduced to Python programming for Apache Hadoop, a quick way to get our hands on a complete Hadoop ecosystem would be most ideal. Cloud vendor Cloudera hosts a number of free QuickStart VMs that contain a single-node Apache Hadoop cluster, complete with sample scripts and ready links to help us dive straight into managing our cluster. The following sections describe how to get a Hadoop VM running on ...

Get Mastering Python for Finance now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.