O'Reilly logo

Python Parallel Programming Cookbook by Giancarlo Zaccone

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Using MapReduce with Disco

Disco is a Python module based on the MapReduce framework introduced by Google, which allows the management of large distributed data in computer clusters. The applications written using Disco can be performed in the economic cluster of machines with a very short learning curve. In fact, the technical difficulties related to the processes that are distributed as load balancing, job scheduling, and the communications protocol are completely managed by Disco and hidden from the developer.

The typical applications of this module are essentially as follows:

  • Web indexing
  • URL access counter
  • Distributed sort
    Using MapReduce with Disco

    The MapReduce schema

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required