7. Introduction to Analytics Engine (Spark) for Big Data
Overview
This chapter will help you learn the fundamentals of Apache Spark. By combining a sequence of transformations and actions, you will be able to create a pipeline in Spark and run it. We will be using Databricks to launch and use a Spark cluster. By the end of this chapter, you should be comfortable with creating and running a Spark pipeline using a Databricks notebook on a Spark cluster.
Introduction
What makes Spark one of the most popular analytics engines? How did Spark evolve to become the parallel processing engine of choice? This chapter will help you get answers to these questions and more.
In the previous chapter, we learned about the various big data file formats, ...
Get The Artificial Intelligence Infrastructure Workshop now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.