Apache Spark

Apache Spark is a fast and general-purpose cluster computing system, initially developed as AMPLab/UC Berkeley as part of the Berkeley Data Analytics Stack (BDAS) (http://en.wikipedia.org/wiki/UC_Berkeley). It provides high-level APIs for the following programming languages that make large and concurrent parallel jobs easy to write and deploy [12:11]:

Scala: http://spark.apache.org/docs/latest/api/scala/index.html
Java: http://spark.apache.org/docs/latest/api/java/index.html
Python: http://spark.apache.org/docs/latest/api/python/index.html

Note

The link to the latest information

The URLs as any reference to Apache Spark may change in future versions.

The core element of Spark is a resilient distributed dataset (RDD), which is a collection ...

Get Scala:Applied Machine Learning now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Scala:Applied Machine Learning by Pascal Bugnion, Patrick R. Nicolas, Alex Kozlov

Apache Spark

Note

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly