O'Reilly logo

Learning Real-time Processing with Spark Streaming by Sumit Gupta

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Your first Spark program

In this section we will discuss the basic terminology used in Spark and then we will code and deploy our first Spark application using Scala and Java.

Now as we have configured our Spark cluster, we are ready to code and deploy our Spark jobs but, before moving forward, let's talk about a few important components of Spark:

  • RDD: Spark works on the concept of RDD (Resilient Distributed Datasets). All data which needs to be processed in Spark needs to be converted into RDD and then it is loaded into the Spark cluster for further processing. RDD is a distributed memory abstraction that lets programmers perform in-memory computations on large clusters in a fault-tolerant manner. Spark provides various ways to create RDDs such ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required