O'Reilly logo

Apache Spark for Data Science Cookbook by Padma Priya Chitturi

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 1. Big Data Analytics with Spark

In this chapter, we will cover the components of Spark. You will learn them through the following recipes:

  • Initializing SparkContext
  • Working with Spark's Python and Scala shells
  • Building standalone applications
  • Working with the Spark programming model
  • Working with pair RDDs
  • Persisting RDDs
  • Loading and saving data
  • Creating broadcast variables and accumulators
  • Submitting applications to a cluster
  • Working with DataFrames
  • Working with Spark Streaming

Introduction

Apache Spark is a general-purpose distributed computing engine for large-scale data processing. It is an open source initiative from AMPLab and donated to the Apache Software Foundation. It is one of the top-level projects under the Apache Software Foundation. ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required