Chapter 7. Spark 2.0 Concepts
Now that you have seen the fundamental underpinnings of Spark, let's take a broader look at the architecture, context, and ecosystem in which Spark operates. This is a catch-all chapter that captures a divergent set of essential topics that will help you get a broader understanding of Spark as a whole. Once you go through this, you will understand who is using Spark and how and where it is being used. This chapter will cover the following topics:
- The Datasets accompanying this book and the IDEs for data wrangling
- A quick description of a data scientist's expectation from Spark
- The Data Lake architecture and the position of Spark
- The evolution and progression of Spark Architecture to 2.0
- The Parquet data storage mechanism ...
Get Fast Data Processing with Spark 2 - Third Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.