Chapter 4 introduced the foundational elements of the Spark SQL module including the core abstraction, structured operations for manipulating structured data, and the support for reading data from and writing data to a variety of data sources. Building on top of that foundation, this chapter covers some of the advanced capabilities of the Spark SQL module as well as takes a peek behind the curtain to explain the optimization and execution efficiency that the Catalyst optimizer and Tungsten engine provide. To help you perform complex analytics, Spark SQL provides a set of powerful and flexible ...
Get Beginning Apache Spark 2: With Resilient Distributed Datasets, Spark SQL, Structured Streaming and Spark Machine Learning library now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.