Skip to content
O'Reilly home
learning path

Learning Path: Hadoop, 2nd Edition


Hadoop and the Hadoop ecosystem is the defacto standard in the data industry for large-scale data processing. This learning path covers core technologies for creating Hadoop clusters, and what you need to ingest, parse, access, and analyze your data at scale. Technologies such as Hive, Pig, MapReduce, and YARN are covered, as well as techniques and best practices for integrating these technologies to implement complete solutions.

Large amounts of data are very difficult to process and can take up a massive amount of processing power. Hadoop and the ecosystem surrounding it makes large-scale processing of datasets manageable and compresses the time needed for that processing. Business and research are relying more on actionable results stemming from the rapid analysis of large amounts of data, so the need for distributed processing solutions with Hadoop as a framework is paramount.

This learning path contains self assessments – short, multiple-choice quizzes that you'll take as you work through a Learning Path. They give you quick insights into how you're doing and take the guesswork out of learning.

What you’ll learn—and how you can apply it

  • Fundamentals of working with Hadoop, the HDFS architecture, setting up the Hadoop infrastructure (Hive, Pig, and Impala), and importing and exporting data into your framework
  • Scheduling, running, and monitoring applications with Hadoop YARN
  • How to create and query datasets with SQL using Apache Hive
  • The core concepts and methodologies behind using Hadoop
  • How to use the commands and build structured applications to analyze and manage your data
  • Using Hadoop to architect a complete end-to-end solution by going through a real case study example of a clickstream analytics engine

This Learning Path is for you because…

  • You're a technical professional who needs to understand core Hadoop technologies for your job, or for a prospective job
  • You're a developer tasked with creating or managing a Hadoop application and need to understand what the ecosystem projects are and how to use them
  • You're a data scientist who needs to manage or analyze large amounts of data in the most effective way possible

Prerequisites: None

Materials or downloads needed in advance: Supplemental Content

About the Publisher

Presented in stunning HD quality, the Infinite Skills range of video based training provides a clear and concise way to learn computer applications and programming languages at your own speed. Delivered to your Desktop, iPad or iPhone, high quality training is never more than a click away.

To increase retention and provide an intuitive learning experience, Infinite Skills formats the training in easy to follow step-by-step lessons that build into a comprehensive learning resource, allowing even the most complex topics to be quickly mastered no matter what the user’s prior skill level.

The emphasis on all Infinite Skills products is delivering affordable, high quality training in a format that allows users to learn real life practical skills that are so important in today’s commercial environments.

More about Infinite Skills