Chapter 4. Spark SQL

We've had a roller coaster ride so far. In the last chapter, we looked at performing ELT with Spark, and most importantly, loading and saving data from and to various data sources. We've looked at structured data streams and NoSQL databases, and during all that time we have tried to keep our attention on using RDDs to work with such data sources. We had slightly touched upon DataFrame and DataSet API, but refrained from going into too much detail around these topics, as we wanted to cover it in full detail in this chapter.

If you have a database background and are still trying to come to terms with RDD API, this is the chapter you'll love the most, as it essentially explains how you can use SQL to exploit the capabilities of ...

Get Learning Apache Spark 2 now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.