In this event, we'll examine Spark SQL, a new Alpha component that is part of the Apache Spark 1.0 release. Spark SQL lets developers natively query data stored in both existing RDDs and external sources such as Apache Hive. A key feature of Spark SQL is the ability to blur the lines between relational tables and RDDs, making it easy for developers to intermix SQL commands that query external data with complex analytics. In addition to Spark SQL, we'll explore the Catalyst optimizer framework, which allows Spark SQL to automatically rewrite query plans to execute more efficiently.
- Title: Performing Advanced Analytics on Relational Data with Spark SQL
- Release date: July 2014
- Publisher(s): O'Reilly Media, Inc.
- ISBN: 978149190828
You might also like
CCA 159: Expert in Big Data Analytics - Advance Hive and Sqoop
This course will help you understand Hive, along with preparing you to achieve CCA159 (Cloudera Big …
Automatic Differentiation in Python and PyTorch
Discover automatic differentiation with PyTorch Autograd for deep learning.
Learn Hadoop and Azure HDInsight Basics this Evening (in 2 hours)
The Apache Hadoop is a framework that allows for the distributed processing of large data sets …
Big Data Processing with Apache Spark
Processing big data in real time is challenging due to scalability, information consistency, and fault-tolerance. Big …