Mastering Spark for Structured Streaming

Video description

Spark is one of today’s most popular distributed computation engines for processing and analyzing big data. This course provides data engineers, data scientist and data analysts interested in exploring the technology of data streaming with practical experience in using Spark. You’ll learn about the Spark Structured Streaming API, the powerful Catalyst query optimizer, the Tungsten execution engine, and more in this hands-on course where you’ll build small several applications that leverage all the aspects of Spark 2.0. While not a requirement, the course works best for those with some Scala experience.

  • Understand the main features of Spark and its advantages over existing systems
  • Learn the basics of parallelism, streaming computation, and Spark streaming
  • Explore the distinctions between Spark Structured Streaming and legacy DStream APIs
  • Understand how to write to and use the Spark Structured Streaming API
  • Learn about the new Catalyst query optimizer and the Tungsten execution engine
  • Discover how Scala and Spark Structured Streaming simplify distributed streaming tasks
  • Gain hands-on experience building applications using Spark 2.0

Michael Li is the founder of The Data Incubator, which provides big data corporate training and a selective eight-week fellowship for PhDs transitioning into industry. Previously, he worked as a data scientist, software engineer, and researcher at Foursquare, Google, Andreessen Horowitz, J.P. Morgan, and NASA. He is a regular contributor to VentureBeat, The Next Web, and Harvard Business Review. Michael earned his Ph.D. at Princeton and was a Marshall Scholar in Cambridge.

Publisher resources

Download Example Code

Product information

  • Title: Mastering Spark for Structured Streaming
  • Author(s): Tianhui Michael Li
  • Release date: November 2016
  • Publisher(s): O'Reilly Media, Inc.
  • ISBN: 9781491974438