Stream Processing and Beyond with Apache Flink
Getting started with real-time, low-latency data processing
Time is value—and it’s missing in your legacy big data infrastructure. Businesses now must generate more value from their real-time data to offer better service and interact with their customers faster. This requires shifting from big data, from batch-oriented processing with high latency of hours or even days, to stream processing, where data is processed as it flows in with superlow latency in a scalable, fault-tolerant fashion. And Apache Flink is driving this trend, helping businesses grow and thrive at a faster pace through its streaming-first architecture and key features and capabilities.
Join expert Bowen Li to learn how to leverage Flink’s building blocks and develop Flink applications. You’ll understand how Flink works and get hands-on experience using it to process streaming data in real time, with lessons you can immediately apply in your own work.
What you'll learn-and how you can apply it
By the end of this live online course, you’ll understand:
- How the Apache Flink community works
- Stream processing and Flink fundamental concepts
- What makes Flink fast, reliable, and scalable
- How to architect a native real-time data platform
- What the Apache Flink community has been actively working on beyond stream processing, in fields like machine learning, AI, and serverless computations
And you’ll be able to:
- Set up a local Flink development and testing environment
- Run an end-to-end application with Flink SQL and Kafka
- Write Flink applications using the DataStream API with Java and Flink SQL
This training course is for you because...
- You’re a data engineer or software engineer who’s eager to build real-time data pipelines and platforms.
- You’re a product manager or business manager who wants to understand the use cases and functionalities offered by Flink and stream processing.
- You want to become an expert in stream processing, real-time ETL, and Apache Flink.
- A basic understanding of Java programming and SQL
- Familiarity with big data tools and platforms like Hadoop
- Download and install the Flink 1.10 distribution and run a local Flink cluster on your computer (instructions)
- Download the Flink exercises from the course GitHub repository, import them to your IDE (IntelliJ or Eclipse), and make sure the WordCount.java program can run successfully
- Download Apache Kafka and set up a local cluster by following the Kafka quickstart instructions
- Read the Flink docs “Stateful Computations over Data Streams,” “Use Cases,” “Applications,” “Operations,” and “Architecture”
About your instructor
Bowen is a committer of Apache Flink and senior engineer at Alibaba. He has been working on Flink for over 3 years, with extended experience on developing and operating Flink in Alibaba at an unprecedented scale.
Besides committing code and reviewing designs, Bowen is a frequent speaker of Flink at conferences and events, evangelizing Flink and stream processing, to make the world a little bit more real-time data driven at a time.
The timeframes are only estimates and may vary according to how the class is progressing
Introduction to Apache Flink and stream processing (45 minutes)
- Presentation: Introduction to Flink; use cases; example architectures and pipelines based on Flink; event time and watermarks; advanced windowing
Break (5 minutes)
Inside Flink (40 minutes)
- Presentation: Flink state and state backend; exactly-once and at-least-once; checkpointing
- Hands-on exercise: Run and operate a local Flink cluster
Break (5 minutes)
Processing stream data with the Flink DataStream API and Flink SQL (40 minutes)
- Presentation: Connectors; deployment; the DataStream API; Flink SQL and stream-table duality
- Hands-on exercises: Build a streaming application with the Flink DataStream API; build a streaming application with Flink SQL and Kafka
Break (5 minutes)
Beyond stream processing: Unified data engines, machine learning/AI, and more (40 minutes)
- Presentation: Flink batch; Flink + a traditional data warehouse; Flink + machine learning/deeping learning; Flink + serverless