Video description
In March 2016 at Strata in San Jose, CA, a standing room only audience of excited developers heard the first public overview of the dramatic changes coming to Apache Spark. Listen and watch as Databrick presenters Michael Armbrust and Tathagata Das run through the breakthrough concepts and technologies driving the Structured Streaming capabilities in Spark 2.0.
- Get a first-look preview of the break-through changes coming to Structured Streaming in 2.0
- Understand the unified input/out API that works with virtually any format (JSON, Parquet, etc.)
- Learn about Datasets – a new abstraction that eliminates large swaths of unnecessary code
- Get how 2.0 simplifies exploration of large data stores and ensures error-free production pipelines
- Learn about the Catalyst optimizer and Tungsten – new tools for efficient pipeline analysis
- Explore an end-to-end execution pipeline that allows difficult ad-hoc interactive queries and more
- Learn about streaming DataFrames and how they unify interactive analysis
- See a demo of Structured Streaming and learn why it was developed
Publisher resources
Table of contents
-
Apache Spark and Real-time Analytics: From Interactive Queries to Streaming
- Achieving Real-time Analytics - Michael Armbrust (Databricks)
- Develop Productively with Simple APIs - Michael Armbrust (Databricks)
- Datasets with SQL - Michael Armbrust (Databricks)
- Dynamic Datasets (DataFrames) - Michael Armbrust (Databricks)
- Static Datasets - Michael Armbrust (Databricks)
- Unified, Structured APIs - Michael Armbrust (Databricks)
- Execute Efficiently: The Catalyst Optimizer and Tungsten Engine - Michael Armbrust (Databricks)
- Operating Directly on Serialized Data - Michael Armbrust (Databricks)
- Understanding Encoders - Michael Armbrust (Databricks)
- Streaming: Updating Automatically - Michael Armbrust (Databricks)
- Demo: Structured Streaming in Spark 2.0 - Michael Armbrust (Databricks)
-
Taking Spark Streaming to the Next Level with DataFrames
- Streaming in Spark - Tathagata Das (Databricks)
- Lessons from: Pain Points with DStreams - Tathagata Das (Databricks)
- Building Structured Streaming - Tathagata Das (Databricks)
- Continuous Aggregations - Tathagata Das (Databricks)
- Execution - Tathagata Das (Databricks)
- Advantages over DStreams - Tathagata Das (Databricks)
- Stateful Stream Processing - Tathagata Das (Databricks)
- Plan for Spark 2.0 - Tathagata Das (Databricks)
Product information
- Title: Apache Spark 2.0
- Author(s):
- Release date: May 2016
- Publisher(s): O'Reilly Media, Inc.
- ISBN: 9781491963975
You might also like
video
Apache Spark with Scala – Hands-On with Big Data!
“Big data” analysis is a hot and highly valuable skill—and this course will teach you the …
video
Introduction to Apache Spark
Get up to speed on Apache Spark for building big data applications in Python, Java, or …
video
Debugging Apache Spark
Apache Spark is an extremely powerful general purpose distributed system that also happens to be extremely …
video
Processing Covid-19 Data with Apache Spark
How to use JHU data and Apache Spark to predict Covid-19 outbreaks.