OBJECTIVE
CHAPTER
9
Working with Big Data
inPython
9.1 Prerequisites
9.2 Basic Libraries in Python
9.3 Python Libraries for Dealing
with Large Data Sets
9.4 Python-MapReduce Using
Hadoop Streaming
In the previous chapter, we explored the possibilities of
using R as an alternate tool to work with large data sets.
We made an in-depth analysis about the capabilities of
R programming to handle large data sets. Finally, we
wrapped up with an introduction to the integration of
R with Hadoop ecosystem.
Continuing with the similar thread for Python, in
this chapter, we shall explore the capabilities of Python
for data management. We shall start with a basic expo-
sure to Python programming language and then build
a deep understanding of the salient Python libraries
along with data handling. We shall also unveil a couple
of very important functionalities in Python for handling
large data and parallel computation. Finally, we shall
get a quick introduction on how Python can be inte-
grated with Hadoop ecosystem.
M09 Big Data Simplified XXXX 01.indd 229 5/10/2019 10:22:55 AM

Get Big Data Simplified now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.