In this event, we'll examine Spark SQL, a new Alpha component that is part of the Apache Spark 1.0 release. Spark SQL lets developers natively query data stored in both existing RDDs and external sources such as Apache Hive. A key feature of Spark SQL is the ability to blur the lines between relational tables and RDDs, making it easy for developers to intermix SQL commands that query external data with complex analytics. In addition to Spark SQL, we'll explore the Catalyst optimizer framework, which allows Spark SQL to automatically rewrite query plans to execute more efficiently.
- Title: Performing Advanced Analytics on Relational Data with Spark SQL
- Release date: July 2014
- Publisher(s): O'Reilly Media, Inc.
- ISBN: 978149190828
You might also like
Stream Processing with Apache Spark
Before you can build analytics tools to gain quick insights, you first need to know how …
O'Reilly Strata Data Conference 2019 - New York, New York
The 2019 Strata Data Conference NYC, the biggest Big Data conference in the world, was a …
Practical Statistics for Data Scientists, 2nd Edition
Statistical methods are a key part of data science, yet few data scientists have formal statistical …
Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd Edition
Through a series of recent breakthroughs, deep learning has boosted the entire field of machine learning. …