O'Reilly logo
live online training icon Live Online training

Write Your First Spark Program in Java

Going Head First into Spark

Jesse Anderson

You don’t need to learn Scala or Python to use Spark. Java 8 lambdas can be used to write concise and clear Spark code. Designed for absolute beginners to Spark, this course focuses on the information that developers and technical teams need to be successful when writing a Spark program. You’ll learn about Spark, Java 8 lambdas, and how to use Spark’s many built-in transformation functions. We’ll also cover the pros and cons of using Java versus Scala. By the end of the course, you’ll able to write a simple Spark program, write a Java 8 lambda, and use Spark transforms to process data.

What you'll learn-and how you can apply it

By the end of this course…

You'll understand:

  • What Spark is and how to use it.
  • What Java 8 lambdas are and how to use them.
  • The pros and cons of using Java versus Scala.

And you'll be able to:

  • Write a simple Spark program.
  • Know how to write a Java 8 lambda.
  • Use Spark transforms to process data.

This training course is for you because...

  • You are a software engineer and you need to write Apache Spark code.
  • You are a software architect who wants to understand how data flows through Apache Spark.
  • You are a business analyst who needs to write a more complex analysis with Apache Spark.
  • You are a business intelligence analyst who wants to run complex analytics at scale.
  • You are a quality assurance engineer who needs to test Apache Spark code.


  • All attendees will need a technical background. To program, the attendee will need to be able to program in one of the following languages: Java, Ruby, Python, or Perl.
  • If you are taking this class at your place of work, verify with your network administrator that you can access ports 4822 and 8080. If those ports aren't opened, please ask your network administrator to open them.

VIRTUAL MACHINE SETUP INSTRUCTIONS NEEDED PRIOR TO CLASS https://www.dropbox.com/s/tnd4agg4sv2lkef/Write_your_first_Spark_program_in_Java_VM_Instructions.pdf?dl=0

Please be sure to download class VM instructions before class begins: - https://www.dropbox.com/s/4tin1w3t76o2c44/Write_your_first_Spark_program_in_Java_VM_Instructions%20%281%29.pdf?dl=0

About your instructor

  • Jesse Anderson is the Managing Director at Big Data Institute. He trains at companies ranging from startups to Fortune 100 companies on Big Data. This includes training on cutting edge technology like Apache Kafka, Apache Hadoop and Apache Spark. He has taught thousands of students the skills to become Data Engineers.

    He is widely regarded as an expert in the field and his novel teaching practices. Jesse is published on O’Reilly and Pragmatic Programmers. He has been covered in prestigious publications such as The Wall Street Journal, CNN, BBC, NPR, Engadget, and Wired.


The timeframes are only estimates and may vary according to how the class is progressing

  • Introduction (10 mins)
  • About Spark - We’ll review the basics of Spark and how it works. (40 mins)
  • Spark Java API - You’ll learn how to create simple Spark jobs, using the Java API for Spark (1 hour, 10 mins)
  • Spark Java Exercise - We’ll complete an exercise that enables you to write your first Spark job. (1 hour)