Skip to content
O'Reilly home
Learning Path

Scala for Spark

Start
Time to complete: 1h 45m
Topics

Published byO'Reilly Media, Inc.

CreatedJanuary 2018

Description

IT organizations are tasked with processing and analyzing enormous (and growing) volumes of data. Spreading that chore among clusters of compute devices has become the standard solution, and Spark has quickly become one of the top tools to handle this ever-more-important operation. And though Spark was designed to be used with multiple programming languages, it was purposefully written in Scala. Scala code is concise and human-readable, and the Scala API provides features to optimize code for better productivity, making it easy to write down your idea without having to translate it to a less flexible API. By using Scala for Spark, you can achieve the best performance and most complete API coverage.

In this learning path, you’ll learn the basics of Scala to optimize your use of Spark. You’ll learn about Scala’s static typing with type inference capabilities, making it possible for the Scala parser to find more errors in your expressions at compile time, which helps to avoid problems later, at runtime. You’ll also see how to define classes, functions, and variables that you can use to define Spark jobs. Having a basic understanding of Scala for Spark will also better enable you to debug problems when they do occur. If you have some background in programming and are interested in learning how to use Spark to perform fast data analytics at scale, you’ll want to dive into this learning path.

What you’ll learn—and how you can apply it

  • Basic Scala concepts that give you complete access to all of Spark’s features
  • How to define classes, variables, and functions, and how to use them to define Spark jobs
  • How features like pattern matching and case classes can make your Spark code concise and easy to understand

This learning path is for you because…

  • You are a data scientist, business analyst, or engineer already using Spark in another language and want to improve your results by learning the basics of Scala—Spark’s native language—in relation to its use with Spark
  • You are a programmer interested in becoming a data scientist or engineer and want to learn how to use Spark quickly and effectively
  • You are a member of a team that works with big data and you want to learn to use Scala to control Spark in order to achieve better analytical results

Prerequisites:

  • You should have programming experience in any language
  • Being familiar with Java is helpful, but not mandatory
  • Prior exposure to Spark is helpful, but not mandatory

Materials or downloads needed in advance: