7. Introduction to Analytics Engine (Spark) for Big Data

Overview

This chapter will help you learn the fundamentals of Apache Spark. By combining a sequence of transformations and actions, you will be able to create a pipeline in Spark and run it. We will be using Databricks to launch and use a Spark cluster. By the end of this chapter, you should be comfortable with creating and running a Spark pipeline using a Databricks notebook on a Spark cluster.

Introduction

What makes Spark one of the most popular analytics engines? How did Spark evolve to become the parallel processing engine of choice? This chapter will help you get answers to these questions and more.

In the previous chapter, we learned about the various big data file formats, ...

Get The Artificial Intelligence Infrastructure Workshop now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.