Hadoop, 2nd Edition
Hadoop and the Hadoop ecosystem is the defacto standard in the data industry for large-scale data processing. This learning path covers core technologies for creating Hadoop clusters, and what you need to ingest, parse, access, and analyze your data at scale. Technologies such as Hive, Pig, MapReduce, and YARN are covered, as well as techniques and best practices for integrating these technologies to implement complete solutions.
Large amounts of data are very difficult to process and can take up a massive amount of processing power. Hadoop and the ecosystem surrounding it makes large-scale processing of datasets manageable and compresses the time needed for that processing. Business and research are relying more on actionable results stemming from the rapid analysis of large amounts of data, so the need for distributed processing solutions with Hadoop as a framework is paramount.
This learning path contains self assessments – short, multiple-choice quizzes that you'll take as you work through a Learning Path. They give you quick insights into how you're doing and take the guesswork out of learning.
What you’ll learn—and how you can apply it
- Fundamentals of working with Hadoop, the HDFS architecture, setting up the Hadoop infrastructure (Hive, Pig, and Impala), and importing and exporting data into your framework
- Scheduling, running, and monitoring applications with Hadoop YARN
- How to create and query datasets with SQL using Apache Hive
- The core concepts and methodologies behind using Hadoop
- How to use the commands and build structured applications to analyze and manage your data
- Using Hadoop to architect a complete end-to-end solution by going through a real case study example of a clickstream analytics engine
This Learning Path is for you because…
- You're a technical professional who needs to understand core Hadoop technologies for your job, or for a prospective job
- You're a developer tasked with creating or managing a Hadoop application and need to understand what the ecosystem projects are and how to use them
- You're a data scientist who needs to manage or analyze large amounts of data in the most effective way possible
Materials or downloads needed in advance: Supplemental Content