Spark SQL and DataFrames
Before Apache Spark, Apache Hive was the go-to technology whenever anyone wanted to run an SQL-like query on a large amount of data. Apache Hive essentially translated SQL queries into MapReduce-like, like logic, automatically making it very easy to perform many kinds of analytics on big data without actually learning to write complex code in Java and Scala.
With the advent of Apache Spark, there was a paradigm shift in how we can perform analysis on big data scale. Spark SQL provides an easy-to-use SQL-like layer on top of Apache Spark's distributed computation abilities. In fact, Spark SQL can be used as an online analytical processing database.
Spark SQL works by parsing the SQL-like statement into an Abstract ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access