Basic analysis of data with Spark SQL
Spark SQL is a spark module for structured data processing. Almost all the developers know SQL. Spark SQL provides an SQL interface to your Spark data (RDDs). Using Spark SQL you can fire SQL queries or SQL-like queries on your big data set and fetch data in objects called dataframes.
A dataframe is like a relational database table. It has columns in it and we can apply functions to these columns such as groupBy
, and so on. It is very easy to learn and use.
In the next section, we will cover a few examples on how we can use the dataframe and run regular analysis tasks.
Building SparkConf and context
This is just boilerplate code and is the entry point for the usage of our Spark SQL code. Every spark program
Get Big Data Analytics with Java now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.