Chapter 6. Spark Components and Packages
park has many components designed to work together as an integrated system, and many of them are distributed as part of Spark. This differs from much of the rest of the Hadoop ecosystem, which has different projects or systems for each task. You’ve already seen how to effectively use Spark Core, SQL, Streaming, and ML components. This chapter will look at the projects outside of Spark itself, sometimes called external/community components (often called packages). Having a largely integrated system gives Spark two advantages: it simplifies deployment/cluster management, upgrades, and application development by having fewer dependencies and systems to keep track of.
While Spark is comparatively integrated, there are still times when bringing in outside components is well worth the increased complexity. In this chapter, we’ll help ...