Skip to Content
Hands-On Software Engineering with Python
book

Hands-On Software Engineering with Python

by Nimesh Verma, Brian Allbee
October 2018
Beginner to intermediate
736 pages
17h 39m
English
Packt Publishing
Content preview from Hands-On Software Engineering with Python

Python, Hadoop, and Spark

It's likely that the most common or popular of the large-scale, cluster computing frameworks available is Hadoop. Hadoop is a collection of software that provides cluster computing capabilities across networked computers, as well as a distributed storage mechanism that can be thought of as a network-accessible filesystem.

Among the utilities it provides is Hadoop Streaming (https://hadoop.apache.org/docs/r1.2.1/streaming.html), which allows for the creation and execution of Map/Reduce jobs using any executable or script as a mapper and/or reducer. Hadoop's operational model, at least for processes that can use Streaming, is file-centric, so processes written in Python and executed under Hadoop will tend to fall into ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Software Architecture with Python

Software Architecture with Python

Anand Balachandran Pillai
Data Structures and Algorithms in Python

Data Structures and Algorithms in Python

Michael T. Goodrich, Roberto Tamassia, Michael H. Goldwasser

Publisher Resources

ISBN: 9781788622011Supplemental Content