Skip to Content
For Enterprise
For Government
For Higher Ed
For Individuals
For Marketing
For Enterprise
For Government
For Higher Ed
For Individuals
For Marketing
Explore Skills
Cloud Computing
Microsoft Azure
Amazon Web Services (AWS)
Google Cloud
Cloud Migration
Cloud Deployment
Cloud Platforms
Data Engineering
Data Warehouse
SQL
Apache Spark
Microsoft SQL Server
MySQL
Kafka
Data Lake
Streaming & Messaging
NoSQL Databases
Relational Databases
Data Science
Pandas
R
MATLAB
SAS
D3
Power BI
Tableau
Statistics
Exploratory Data Analysis
Data Visualization
AI & ML
Generative AI
Machine Learning
Artificial Intelligence (AI)
Deep Learning
Reinforcement Learning
Natural Language Processing
TensorFlow
Scikit-Learn
Hyperparameter Tuning
MLOps
Programming Languages
Java
JavaScript
Spring
Python
Go
C#
C++
C
Swift
Rust
Functional Programming
Software Architecture
Object-Oriented
Distributed Systems
Domain-Driven Design
Architectural Patterns
IT/Ops
Kubernetes
Docker
GitHub
Terraform
Continuous Delivery
Continuous Integration
Database Administration
Computer Networking
Operating Systems
IT Certifications
Security
Network Security
Application Security
Incident Response
Zero Trust Model
Disaster Recovery
Penetration Testing / Ethical Hacking
Governance
Malware
Security Architecture
Security Engineering
Security Certifications
Design
Web Design
Graphic Design
Interaction Design
Film & Video
User Experience (UX)
Design Process
Design Tools
Business
Agile
Project Management
Product Management
Marketing
Human Resources
Finance
Team Management
Business Strategy
Digital Transformation
Organizational Leadership
Soft Skills
Professional Communication
Emotional Intelligence
Presentation Skills
Innovation
Critical Thinking
Public Speaking
Collaboration
Personal Productivity
Confidence / Motivation
Features
All features
Verifiable skills
AI Academy
Courses
Certifications
Interactive learning
Live events
Superstreams
Answers
Insights reporting
Radar Blog
Buy Courses
Plans
Sign In
Try Now
O'Reilly Platform
book
Flink基础教程
by
Ellen Friedman
,
Kostas Tzoumas
August 2018
Intermediate to advanced
98 pages
2h 11m
Chinese
Posts & Telecom Press
Content preview from
Flink基础教程
35
第
4
章
对时间的处理
用流处理器编程和用批处理器编程最关键的区别在于对时间的处理。举一
个非常简单的例子:计数。事件流数据(如微博内容、点击数据和交易数
据)不断产生,我们需要用
key
将事件分组,并且每隔一段时间(比如一
小时)就针对每一个
key
对应的事件计数。这是众所周知的“大数据”应
用,与
MapReduce
的词频统计例子相似。
4.1
采用批处理架构和
Lambda
架构计数
尽管看起来简单,但是大规模的计数任务在实践中出人意料地困难。当然,
计数无处不在。针对联机分析处理多维数据集的聚合或其他操作,都可以
简单地归结为计数。图
4-1
展示了如何采用传统的批处理架构实现计数
任务。
在该架构中,持续摄取数据的管道每小时创建一次文件。这些文件通常被
存储在
HDFS
或
MapR-FS
等分布式文件系统中。像
Apache
Flume
这样的
工具可以用于完成上述工作。由调度程序安排批处理作业(如
MapReduce
作业)分析最近生成的一个文件(将文件中的事件按
key
分组,计算每个
key
对应的事件数),然后输出计数结果。对于每个使用
Hadoop
的公司来
说,其集群都有多个类似的管道。
36
|
第
4
章
调度程序
时间
服务和存储
文件
1
文件
2
文件
3
作业
1
作业
2
作业
3
图
4-1
:用定期运行的批处理作业来实现应用程序的持续性。数据被持续地分割为 ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial
You might also like
机器学习流水线实战
Hannes Hapke, Catherine Nelson
Kafka权威指南(第2版)
Gwen Shapira, Todd Palino, Rajini Sivaram, Krit Petty
Java并发编程实战
Brian Goetz, Tim Peierls
MySQL® Crash Course
Ben Forta
Publisher Resources
ISBN: 9787115490063