CHAPTER 7

image

Spark SQL

Ease of use is one of the reasons Spark became popular. It provides a simpler programming model than Hadoop MapReduce for processing big data. However, the number of people who are fluent in the languages supported by the Spark core API is a lot smaller than the number of people who know the venerable SQL.

SQL is an ANSI/ISO standard language for working with data. It specifies an interface for not only storing, modifying and retrieving data, but also for analyzing data. SQL is a declarative language. It is much easier to learn and use compared to general-purpose programming languages such as Scala, Java and Python. However, ...

Get Big Data Analytics with Spark: A Practitioner’s Guide to Using Spark for Large-Scale Data Processing, Machine Learning, and Graph Analytics, and High-Velocity Data Stream Processing now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.