Chapter 2. Getting Started with Apache Hadoop and Apache Spark

In this chapter, we will understand the basics of Hadoop and Spark, how Spark is different from MapReduce, and get started with the installation of clusters and setting up the tools needed for analytics.

This chapter is divided into the following subtopics:

Introducing Apache Hadoop
Introducing Apache Spark
Discussing why we use Hadoop with Spark
Installing Hadoop and Spark clusters

Introducing Apache Hadoop

Apache Hadoop is a software framework that enables distributed processing on large clusters with thousands of nodes and petabytes of data. Apache Hadoop clusters can be built using commodity hardware where failure rates are generally high. Hadoop is designed to handle these failures gracefully ...

Get Big Data Analytics now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Big Data Analytics by Venkat Ankam

Chapter 2. Getting Started with Apache Hadoop and Apache Spark

Introducing Apache Hadoop

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly