Skip to Content
High Performance Spark, 2nd Edition
book

High Performance Spark, 2nd Edition

by Holden Karau, Adi Polak, Rachel Warren
May 2026
Intermediate to advanced
350 pages
2h 50m
English
O'Reilly Media, Inc.
Content preview from High Performance Spark, 2nd Edition

Chapter 6. Spark Components and Packages

park has many components designed to work together as an integrated system, and many of them are distributed as part of Spark. This differs from much of the rest of the Hadoop ecosystem, which has different projects or systems for each task. You’ve already seen how to effectively use Spark Core, SQL, Streaming, and ML components. This chapter will look at the projects outside of Spark itself, sometimes called external/community components (often called packages). Having a largely integrated system gives Spark two advantages: it simplifies deployment/cluster management, upgrades, and application development by having fewer dependencies and systems to keep track of.

While Spark is comparatively integrated, there are still times when bringing in outside components is well worth the increased complexity. In this chapter, we’ll help ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Learning Spark, 2nd Edition

Learning Spark, 2nd Edition

Jules S. Damji, Brooke Wenig, Tathagata Das, Denny Lee

Publisher Resources

ISBN: 9781098145842Errata Page