Hadoop with Python

Hadoop with Python

Get the Free Ebook

Hadoop is mostly written in Java, but that doesn't exclude the use of other programming languages with this distributed storage and processing framework, particularly Python. With this concise book, you'll learn how to use Python with the Hadoop Distributed File System (HDFS), MapReduce, the Apache Pig platform and Pig Latin script, and the Apache Spark cluster-computing framework.

Authors Zachary Radtka and Donald Miner from the data science firm Miner & Kasch take you through the basic concepts behind Hadoop, MapReduce, Pig, and Spark. Then, through multiple examples and use cases, you'll learn how to work with these technologies by applying various Python tools.

  • Use the Python library Snakebite to access HDFS programmatically from within Python applications
  • Write MapReduce jobs in Python with mrjob, the Python MapReduce library
  • Extend Pig Latin with user-defined functions (UDFs) in Python
  • Use the Spark Python API (PySpark) to write Spark programs with Python
  • Learn how to use the Luigi Python workflow scheduler to manage MapReduce jobs and Pig scripts

Zachary Radtka, a platform engineer at Miner & Kasch, has extensive experience creating custom analytics that run on petabyte-scale data sets.

Fill out the form below

All fields are required.

We protect your privacy.
Donald Miner

Donald Miner

Donald Miner is an avid user of Apache Hadoop and a practitioner of data science. He serves as Chief Technology Officer at ClearEdge IT Solutions, a company that provides Big Data professional services. He is author of the O'Reilly book MapReduce Design Patterns, which is based on his experiences as a MapReduce developer. Donald has architected and implemented a number of mission-critical and large-scale Hadoop systems within the U.S. Government and Fortune 500 companies. He received his PhD from the University of Maryland, Baltimore County in Computer Science, where he focused on Machine Learning and Multi-Agent Systems. He lives in Maryland with his wife and two young sons. Twitter: @donaldpminer