CHAPTER 6 Beyond Spark

There are several techniques required to run Spark effectively that have been covered in previous chapters. They are the preparation for running Spark jobs. So it is now time to run our Spark application. Usually, this is a machine learning algorithm or an aggregation of some log data, or a business intelligence workload. Spark can be applied in many areas, including Business Intelligence, data warehousing, recommendation, fraud detection, and more. Spark has a large and growing community, as well as ecosystems that are indispensable in enterprise environments. These ecosystems provide many functionalities that are evident in various production use cases. Of course we first need to understand how to use Spark with the external libraries that are active contributors in the first place!

In this chapter, we’ll introduce you to the Spark case studies and frameworks covering topics such as data warehousing and machine learning. These are areas where Spark can really help you solve problems. Although Spark is a relatively new tool, there are various kinds of use cases that exist, which we will cover. These frameworks are crucial for your Spark applications, so follow along as we explore them.

Data Warehousing

Data analysis is the core process required for making progress of any kind in business. The data warehousing system is a significant system for this type of analysis. Thanks to a lot of frameworks and ecosystems, Spark can be a core component to provide ...

Get Spark now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.