Video description
Apache Spark is an extremely powerful general purpose distributed system that also happens to be extremely difficult to debug. This video, designed for intermediate-level Spark developers and data scientists, looks at some of the most common (and baffling) ways Spark can explode (e.g., out of memory exceptions, unbalanced partitioning, strange serialization errors, debugging errors inside your own code, etc. ) and then provides a set of remedies for keeping those blow-ups under control. You'll pick up techniques for improving your own logging (and reducing your dependence on Spark's verbose logs); learn how to deal with fuzzy data; discover how to connect and use a debugger in a distributed environment; and gain the ability to know which Spark error messages are actually relevant.
- Understand why Spark is difficult to debug, the types of Spark failures, and how to recognize them
- Explore the differences between debugging single node and distributed systems
- Learn the best debugging techniques for Spark and a framework for debugging
Holden Karau is an open source developer advocate at Google focusing on Apache Spark, Beam, and related big data tools. She is an in-demand speaker at O'Reilly Media's Strata + Hadoop conferences, a committer on the Apache Spark, SystemML, and Mahout projects, and the author of multiple O'Reilly titles including High Performance Spark and Learning Spark. She holds a bachelor's degree in math and computer science from the University of Waterloo.
Table of contents
-
Debugging Apache Spark
- Introduction
- A Quick Re-cap of Spark's Design
- Finding Your Logs in Spark (and Finding the Right Ones)
- The DAG (Not to Be Confused with Dog) and Query Plan
- Finding the Root Cause of an Error in Spark with Lazy Evaluation
- A Summary of Common Spark Errors
- Diagnosing Key-Skew Problems with Spark
- Out of Memory Exceptions in Spark
- Reading JVM stack traces for non-JVM developers
- Serialization Errors in Spark
- It's Not Always Spark's Fault: Debugging Errors inside of Transformations
- Adding your own logging and using accumulators
- Attaching Remote Debuggers to Spark
- Next Steps: Testing and Monitoring
Product information
- Title: Debugging Apache Spark
- Author(s):
- Release date: November 2018
- Publisher(s): O'Reilly Media, Inc.
- ISBN: 9781492039167
You might also like
audiobook
Brainhacker
If you feel like your brain is "glitching," or working against you, know that you're not …
audiobook
Speak with Confidence
In Speak with Confidence: Overcome Self-Doubt, Communicate Clearly, and Inspire Your Audience, keynote speaker, author, and …
audiobook
Fall in Love with the Problem, Not the Solution
Unicorns-companies that reach a valuation of more than $1 billion-are rare. Uri Levine has built two. …
video
Design Patterns in the Real World, an Analysis-Based Approach
Alan Holub takes coders deep into the reality of Gang-of-Four design patterns, those reusable guides to …