O'Reilly logo

Practical Predictive Analytics by Ralph Winters

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Cluster computing

Cluster computing allows Spark to process and distribute data over many computers at once, in parallel. A cluster manager allocates resources for the cluster depending upon user requests. An important aspect of Spark is that it attempts to keep as much data in memory as needed, so that data is available for the various analyses as quickly as possible rather than having to wait to retrieve data from storage every time a query or model is specified.

Spark data is stored as RDDs, which allow different kinds of objects to be spread out over the cluster.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required