Video description
Spark is one of today’s most popular distributed computation engines for processing and analyzing big data. This course provides data engineers, data scientist and data analysts interested in exploring the technology of data streaming with practical experience in using Spark. You’ll learn about the Spark Structured Streaming API, the powerful Catalyst query optimizer, the Tungsten execution engine, and more in this hands-on course where you’ll build small several applications that leverage all the aspects of Spark 2.0. While not a requirement, the course works best for those with some Scala experience.
- Understand the main features of Spark and its advantages over existing systems
- Learn the basics of parallelism, streaming computation, and Spark streaming
- Explore the distinctions between Spark Structured Streaming and legacy DStream APIs
- Understand how to write to and use the Spark Structured Streaming API
- Learn about the new Catalyst query optimizer and the Tungsten execution engine
- Discover how Scala and Spark Structured Streaming simplify distributed streaming tasks
- Gain hands-on experience building applications using Spark 2.0
Michael Li is the founder of The Data Incubator, which provides big data corporate training and a selective eight-week fellowship for PhDs transitioning into industry. Previously, he worked as a data scientist, software engineer, and researcher at Foursquare, Google, Andreessen Horowitz, J.P. Morgan, and NASA. He is a regular contributor to VentureBeat, The Next Web, and Harvard Business Review. Michael earned his Ph.D. at Princeton and was a Marshall Scholar in Cambridge.
Publisher resources
Table of contents
- Overview
- Spark Datasets and Structured Streaming
-
Spark Structured Streaming
- Spark Structured Streaming
- Netcat Socket Structured Streaming Example
- Socket Structured Streaming Example
- Spark Structured Streaming Parsing Data
- Constructing Columns in Structured Streaming
- Selecting and Filtering Columns Using Structured Streaming
- GroupBy and Aggregation in Structured Streaming
- Joining Structured Stream with Datasets
- SQL Queries in Spark Structured Streaming
-
DStream Comparison
- Comparing Structured Streaming with DStream
- Custom Receivers in Spark DStream
- Iterative Wordcount Using Spark DStream
- Cumulative Wordcount using Spark DStream
- Benefits of Spark Tungsten
- Tungsten Performance Benefit Demonstration
- Benefits of Spark Catalyst
- Viewing Query Plans in Spark Shell
- Visualizing Query Stages in Spark UI Viewer
- Viewing Spark Catalyst-Optimized Physical Plans
- Standalone Spark Streaming Applications
Product information
- Title: Mastering Spark for Structured Streaming
- Author(s):
- Release date: November 2016
- Publisher(s): O'Reilly Media, Inc.
- ISBN: 9781491974438
You might also like
book
Designing Data-Intensive Applications
Data is at the center of many challenges in system design today. Difficult issues need to …
book
Generative Deep Learning, 2nd Edition
Generative AI is the hottest topic in tech. This practical book teaches machine learning engineers and …
book
Designing Machine Learning Systems
Machine learning systems are both complex and unique. Complex because they consist of many different components …
book
Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 3rd Edition
Through a recent series of breakthroughs, deep learning has boosted the entire field of machine learning. …