O'Reilly logo

Real-Time Big Data Analytics by Shilpi Saxena, Sumit Gupta

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Coding our first Spark SQL job

In this section, we will discuss the basics of writing/coding Spark SQL jobs in Scala and Java. Spark SQL exposes the rich DataFrame API (http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.DataFrame) for loading and analyzing datasets in various forms. It not only provides operations for loading/analyzing data from structured formats such as Hive, Parquet, and RDBMS, but also provides flexibility to load data from semistructured formats such as JSON and CSV. In addition to the various explicit operations exposed by the DataFrame API, it also facilitates the execution of SQL queries against the data loaded in the Spark.

Let's move ahead and code our first Spark SQL job in Scala and then we ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required