Learning Spark core concepts
Let's understand the core concepts of Spark in this section. The main abstraction Spark provides is a Resilient Distributed Dataset (RDD). So, let's understand what an RDD is and operations in RDDs that provide in-memory performance and fault tolerance. But, let's learn the ways to work with Spark first.
Ways to work with Spark
There are a couple of ways to work with Spark—Spark Shell and Spark Applications.
Spark Shell
Interactive REPL (read-eval-print loop) for data exploration using Scala, Python, or R:
// Entering to Scala Shell . :q to exit the shell. [cloudera@quickstart spark-2.0.0-bin-hadoop2.7]$ bin/spark-shell # Entering to Python Shell. ctrl+d to exit the shell. [cloudera@quickstart spark-2.0.0-bin-hadoop2.7]$ ...
Get Big Data Analytics now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.