O'Reilly logo
live online training icon Live Online training

Stream Processing and Beyond with Apache Flink

Mastering real-time, low latency data processing

Topic: Data
Bowen Li

Time is value, but it’s what’s missing in your legacy big data infrastructure. As businesses grow in a world that requires evolving and iterating fast, they demand generating more value from their real-time data to offer better service and interact with their customers faster. That triggers the shift of paradigm of big data, from batch-oriented processing with high latency of hours or even days, to the new norm of stream processing where data is processed as it flows in with super low latency in a scalably, fault-tolerant fashion.

Apache Flink is one of the largest drivers of this trend. Companies are moving to Flink for its streaming-first architecture, features, and capabilities that help businesses grow and thrive in a faster pace.

Job requirements of software engineers and data engineers are going to change along. It’s critical for engineers and developers to learn the rationales of Apache Flink, to stay ahead of the industrial shift. In this 3-hour course with hands-on exercises, you’ll learn how to leverage Flink’s building blocks and develop Flink applications, and apply the lessons and experienced learned to your day-to-day job to process your data in real time like a real pro.

What you'll learn-and how you can apply it

By the end of this live, hands-on, online course, you’ll understand:

  • Fundamental concepts in stream processing and Apache Flink
  • What makes Flink fast, reliable, scalable
  • How to architect a native real-time data platform
  • Write Flink applications in DataStream API with Java and Flink SQL

And you’ll be able to:

  • Set up local Flink developing and testing environment
  • Run an end-to-end application with Flink SQL and Kafka
  • Learn up-to-date information of what Apache Flink community has been actively working on beyond stream processing, in fields like machine learning, AI, and serverless computations.

This training course is for you because...

  • You are a data engineer or software engineer who is eager to build real-time data pipelines and platforms
  • You are a product manager or business manager who wants to figure out use cases and functionalities that Flink and stream processing offers
  • You want to become an expert in stream processing, real time ETL, and Apache Flink

Prerequisites

  • To run the exericses, you will need a Flink cluster running locally on a Mac or Linux machine. Alternatively, on Windows you can run Flink in a Docker container. (This will also work on Mac or Linux.)
  • Intermediate level experience with Java programming and SQL.
  • Some familiarity with big data tools or platforms like Hadoop.

Recommended preparation:

Download the latest Flink distribution, or Flink Docker container to run on your computer. Skim through these official docs, as a warm-up:

  • https://flink.apache.org/
  • https://flink.apache.org/usecases.html
  • https://flink.apache.org/flink-applications.html
  • https://flink.apache.org/flink-operations.html
  • https://flink.apache.org/flink-architecture.html

Recommended follow-up:

About your instructor

  • Bowen is a committer of Apache Flink and senior engineer at Alibaba. He has been working on Flink for over 3 years, with extended experience on developing and operating Flink in Alibaba at an unprecedented scale.

    Besides committing code and reviewing designs, Bowen is a frequent speaker of Flink at conferences and events, evangelizing Flink and stream processing, to make the world a little bit more real-time data driven at a time.

Schedule

The timeframes are only estimates and may vary according to how the class is progressing

Course Overview (5 minutes)

  • Introduction and Welcome
  • Overview of stream processing
  • Demo of the final projects running

Introduction to Apache Flink and Stream Processing (40 minutes)

  • Presentation:
  • Use cases of Flink
  • Example architectures and pipelines based on Flink
  • Poll:
  • What is your job title and background?
  • What do you most want to learn from this course?
  • Presentation:
  • Event Time and Watermarks
  • Advanced Windowing
  • Q&A
  • Break (5 min)

Flink’s Uniqueness (40 minutes)

  • Presentation:
  • Deployment Patterns
  • Connectors
  • Exercise: Run a local Flink cluster, submit a simple Flink, and go to Flink UI see how the cluster and job looks like
  • Presentation:
  • Exactly-Once Semantics and Checkpointing
  • State and State Backends
  • Q&A
  • Break (5 min)

Processing stream data with Flink DataStream API and Flink SQL (40 minutes)

  • Presentation:
  • DataStream API in action
  • Exercise:
  • Query data in Kafka topic with Flink DataStream API
  • Presentation:
  • Stream-Table duality, Dynamic Table vs. Static Table
  • Streaming SQL in action
  • Exercise:
  • Query data in Kafka topic with Flink SQL
  • Q&A
  • Break (5 min)

Beyond Stream Processing: Unified Data Engine, Machine Learning/AI, and more (40 minutes)

  • Presentation:
  • Flink batch processing as unified engine
  • Build real time data warehouse with Flink
  • Exercise: Use Flink to read/write Apache ORC files
  • Presentation:
  • Leveraging Flink in Machine Learning, AI, and Deep Learning
  • Flink for Serverless Applications
  • Q&A