Book description
NoneTable of contents
- Foreword
- Preface
- 1. Introduction to Apache Spark: A Unified Analytics Engine
- 2. Downloading Apache Spark and Getting Started
- 3. Apache Spark’s Structured APIs
- 4. Spark SQL and DataFrames: Introduction to Built-in Data Sources
- 5. Spark SQL and DataFrames: Interacting with External Data Sources
- 6. Spark SQL and Datasets
- 7. Optimizing and Tuning Spark Applications
-
8. Structured Streaming
- Evolution of the Apache Spark Stream Processing Engine
- The Programming Model of Structured Streaming
- The Fundamentals of a Structured Streaming Query
- Streaming Data Sources and Sinks
- Data Transformations
- Stateful Streaming Aggregations
- Streaming Joins
- Arbitrary Stateful Computations
- Performance Tuning
- Summary
-
9. Building Reliable Data Lakes with Apache Spark
- The Importance of an Optimal Storage Solution
- Databases
- Data Lakes
- Lakehouses: The Next Step in the Evolution of Storage Solutions
-
Building Lakehouses with Apache Spark and Delta Lake
- Configuring Apache Spark with Delta Lake
- Loading Data into a Delta Lake Table
- Loading Data Streams into a Delta Lake Table
- Enforcing Schema on Write to Prevent Data Corruption
- Evolving Schemas to Accommodate Changing Data
- Transforming Existing Data
- Auditing Data Changes with Operation History
- Querying Previous Snapshots of a Table with Time Travel
- Summary
- 10. Machine Learning with MLlib
- 11. Managing, Deploying, and Scaling Machine Learning Pipelines with Apache Spark
- 12. Epilogue: Apache Spark 3.0
- Index
- About the Authors
Product information
- Title: Learning Spark, 2nd Edition
- Author(s):
- Release date:
- Publisher(s): O'Reilly Media, Inc.
- ISBN: None
You might also like
book
Designing Data-Intensive Applications
Data is at the center of many challenges in system design today. Difficult issues need to …
book
Learning JavaScript Design Patterns, 2nd Edition
Do you want to write beautiful, structured, and maintainable JavaScript by applying modern design patterns to …
book
Generative Deep Learning, 2nd Edition
Generative AI is the hottest topic in tech. This practical book teaches machine learning engineers and …
book
Effective Java, 3rd Edition
Since this Jolt-award winning classic was last updated in 2008, the Java programming environment has changed …