O'Reilly logo

Spark Cookbook by Rishi Yadav

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 4. Spark SQL

Spark SQL is a Spark module for processing a structured data. This chapter is divided into the following recipes:

  • Understanding the Catalyst optimizer
  • Creating HiveContext
  • Inferring schema using case classes
  • Programmatically specifying the schema
  • Loading and saving data using the Parquet format
  • Loading and saving data using the JSON format
  • Loading and saving data from relational databases
  • Loading and saving data from an arbitrary source

Introduction

Spark can process data from various data sources such as HDFS, Cassandra, HBase, and relational databases, including HDFS. Big data frameworks (unlike relational database systems) do not enforce schema while writing. HDFS is a perfect example where any arbitrary file is welcome during the ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required