Introduction

Apache Spark is a distributed compute framework for easy, at-scale, computation. Some refer to it as a “compute grid” or a “compute framework”—these terms are also correct within the underlying premise that Spark makes it easy for developers to gain access and insight into vast quantities of data.

Apache Spark was created by Matei Zaharia as a research project inside of the University of California, Berkeley in 2009. It was donated to the open source community in 2010. In 2013 Spark was added into the Apache Software Foundation as an Incubator project and graduated into a Top Level Project (TLP) in 2014, where it remains today.

Who This Book Is For

If you’ve picked up this book we presume that you already have an extended fascination with Apache Spark. We consider the intended audience for this book to be one of a developer, a project lead for a Spark application, or a system administrator (or DevOps) who needs to prepare to take a developed Spark application into a migratory path for a production workflow.

What This Book Covers

This book covers various methodologies, components, and best practices for developing and maintaining a production-grade Spark application. That said, we presume that you already have an initial or possible application scoped for production as well as a known foundation for Spark basics.

How This Book Is Structured

This book is divided into six chapters, with the aim of imparting readers with the following knowledge:

  • A deep understanding ...

Get Spark now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.