Integrating Python with large-scale, cluster computing frameworks

Large-scale, cluster computing frameworks, in order to provide as much compatibility with custom written operations as possible, will probably accept input in only two different ways: as command-line arguments, or using standard input, with the latter being more common for systems that are targeted for big data operations. In either case, what's needed to allow a custom process to be executed at and scaled to a clustered environment is a self-contained, command-line executable that usually returns its data to standard output.

A minimal script that accepts standard input—whether by passing data into it with a pipe, or by reading the contents of a file and using that—could be ...

Get Hands-On Software Engineering with Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.