Skip to Content
For Enterprise
For Government
For Higher Ed
For Individuals
For Marketing
For Enterprise
For Government
For Higher Ed
For Individuals
For Marketing
Explore Skills
Cloud Computing
Microsoft Azure
Amazon Web Services (AWS)
Google Cloud
Cloud Migration
Cloud Deployment
Cloud Platforms
Data Engineering
Data Warehouse
SQL
Apache Spark
Microsoft SQL Server
MySQL
Kafka
Data Lake
Streaming & Messaging
NoSQL Databases
Relational Databases
Data Science
Pandas
R
MATLAB
SAS
D3
Power BI
Tableau
Statistics
Exploratory Data Analysis
Data Visualization
AI & ML
Generative AI
Machine Learning
Artificial Intelligence (AI)
Deep Learning
Reinforcement Learning
Natural Language Processing
TensorFlow
Scikit-Learn
Hyperparameter Tuning
MLOps
Programming Languages
Java
JavaScript
Spring
Python
Go
C#
C++
C
Swift
Rust
Functional Programming
Software Architecture
Object-Oriented
Distributed Systems
Domain-Driven Design
Architectural Patterns
IT/Ops
Kubernetes
Docker
GitHub
Terraform
Continuous Delivery
Continuous Integration
Database Administration
Computer Networking
Operating Systems
IT Certifications
Security
Network Security
Application Security
Incident Response
Zero Trust Model
Disaster Recovery
Penetration Testing / Ethical Hacking
Governance
Malware
Security Architecture
Security Engineering
Security Certifications
Design
Web Design
Graphic Design
Interaction Design
Film & Video
User Experience (UX)
Design Process
Design Tools
Business
Agile
Project Management
Product Management
Marketing
Human Resources
Finance
Team Management
Business Strategy
Digital Transformation
Organizational Leadership
Soft Skills
Professional Communication
Emotional Intelligence
Presentation Skills
Innovation
Critical Thinking
Public Speaking
Collaboration
Personal Productivity
Confidence / Motivation
Features
All features
Verifiable skills
AI Academy
Courses
Certifications
Interactive learning
Live events
Superstreams
Answers
Insights reporting
Radar Blog
Buy Courses
Plans
Sign In
Try Now
O'Reilly Platform
book
Hadoop技術手冊 第三版
by
Tom White
January 2013
Intermediate to advanced
680 pages
17h 18m
Chinese
GoTop Information, Inc.
Content preview from
Hadoop技術手冊 第三版
4.4 Avro
|
111
Avro
規格(
http://avro.apache.org/docs/current/spec.html
)嚴格的定義二進位格式,所
有的實作都必須遵循。實作應該要遵循
Avro
其他的功能列入到規格書中。然而,在規
格書中的某些地方並沒有規定,那就是
API
:實
作
在
API
上有完全的自由性來處理
Avro
資料,因為每個實作都必須是程式語言導向的。事實上,只有二進位格式這樣做才有
意義,因為它意謂實作新程式語言繫結的門檻降低了,而且避免程式語言和格式間組
合的問題,而這也提高了相互操作性。
Avro
有豐富的
綱要解析
(
schema resolution
)能力,並在內部仔細地定義了一些限制。
而資料讀取和寫入的綱要不需要一致,這是
Avro
支援格式演進的機制。舉例來說,一
個新的,選擇性的欄位可能被增加到一筆記錄的綱要,並且可能讀取舊的資料。新的和
原有的客戶端都能夠讀取舊資料,且新的客戶能夠用新的欄位寫入新的資料。反過來,
如果一個舊客戶看到新的且編碼過的資料,將會忽略新的欄位而會繼續處理舊資料。
Avro
規範了一個
物件容器格式
(
object container format
)用來儲存物件序列,這和
Hadoop
循序檔很類似。
Avro
資料檔案有一個儲存格式
metadata
的區段,用來描述檔
案資訊。
Avro
資料檔案支援壓縮而且具可分割性,這是
MapReduce
資料輸入格式最關
鍵的地方。此外,因為
Avro
設計用來與
MapReduce
搭配使用,在未來
Avro
可能會變
成
MapReduce ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial
You might also like
面向自然语言处理的深度学习课程:深度神经网络在机器学习任务的应用
乔恩·克罗恩
Java并发编程实战
Brian Goetz, Tim Peierls
机器学习和AI精粹
诺亚·吉夫特
自造手冊:新工業革命實務指南
Paolo Aliverti, Andrea Maietta
Publisher Resources
ISBN: 9789862766682