We can use map/reduce to estimate the Pi. Suppose we have code like this:
import pyspark import random if not 'sc' in globals(): sc = pyspark.SparkContext() NUM_SAMPLES = 1000 def sample(p): x,y = random.random(),random.random() return 1 if x*x + y*y < 1 else 0 count = sc.parallelize(xrange(0, NUM_SAMPLES)) \ .map(sample) \ .reduce(lambda a, b: a + b) print "Pi is roughly %f" % (4.0 * count / NUM_SAMPLES)
This code has the same preamble. We are using the
random Python package. There is a constant for the number of samples to attempt.
We are building an RDD called
count. We call upon the
parallelize function to split up this process over the nodes available. The code just maps the result of the
sample function call. Finally, we ...