Learning Spark core concepts

Let's understand the core concepts of Spark in this section. The main abstraction Spark provides is a Resilient Distributed Dataset (RDD). So, let's understand what an RDD is and operations in RDDs that provide in-memory performance and fault tolerance. But, let's learn the ways to work with Spark first.

Ways to work with Spark

There are a couple of ways to work with Spark—Spark Shell and Spark Applications.

Spark Shell

Interactive REPL (read-eval-print loop) for data exploration using Scala, Python, or R:

// Entering to Scala Shell . :q to exit the shell.
[cloudera@quickstart spark-2.0.0-bin-hadoop2.7]$ bin/spark-shell

# Entering to Python Shell. ctrl+d to exit the shell. 
[cloudera@quickstart spark-2.0.0-bin-hadoop2.7]$ ...

Get Big Data Analytics now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.