O'Reilly logo

Big Data Analytics with Java by Rajat Mehta

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Basic analysis of data with Spark SQL

Spark SQL is a spark module for structured data processing. Almost all the developers know SQL. Spark SQL provides an SQL interface to your Spark data (RDDs). Using Spark SQL you can fire SQL queries or SQL-like queries on your big data set and fetch data in objects called dataframes.

A dataframe is like a relational database table. It has columns in it and we can apply functions to these columns such as groupBy, and so on. It is very easy to learn and use.

In the next section, we will cover a few examples on how we can use the dataframe and run regular analysis tasks.

Building SparkConf and context

This is just boilerplate code and is the entry point for the usage of our Spark SQL code. Every spark program

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required