Hadoop is mostly written in Java, but that doesn't exclude the use of other programming languages with this distributed storage and processing framework, particularly Python. With this concise book, you’ll learn how to use Python with the Hadoop Distributed File System (HDFS), MapReduce, the Apache Pig platform and Pig Latin script, and the Apache Spark cluster-computing framework.
Authors Zachary Radtka and Donald Miner from the data science firm Miner & Kasch take you through the basic concepts behind Hadoop, MapReduce, Pig, and Spark. Then, through multiple examples and use cases, you'll learn how to work with these technologies by applying various Python tools.
- Use the Python library Snakebite to access HDFS programmatically from within Python applications
- Write MapReduce jobs in Python with mrjob, the Python MapReduce library
- Extend Pig Latin with user-defined functions (UDFs) in Python
- Use the Spark Python API (PySpark) to write Spark programs with Python
- Learn how to use the Luigi Python workflow scheduler to manage MapReduce jobs and Pig scripts
Zachary Radtka, a platform engineer at Miner & Kasch, has extensive experience creating custom analytics that run on petabyte-scale data sets.
Table of contents
- Source Code
- 1. Hadoop Distributed File System (HDFS)
- 2. MapReduce with Python
- 3. Pig and Python
- 4. Spark with Python
- 5. Workflow Management with Python
- Title: Hadoop with Python
- Release date: October 2015
- Publisher(s): O'Reilly Media, Inc.
- ISBN: 9781491942260
You might also like
Introducing Python, 2nd Edition
Easy to understand and fun to read, this updated edition of Introducing Python is ideal for …
Python Crash Course, 2nd Edition
This is the second edition of the best selling Python book in the world. Python Crash …
Learning Python, 5th Edition
Get a comprehensive, in-depth introduction to the core Python language with this hands-on book. Based on …
Data Science from Scratch, 2nd Edition
To really learn data science, you should not only master the tools—data science libraries, frameworks, modules, …