Skip to Content
使用 Apache Sedona 进行云原生地理空间分析 (Chinese Edition)
book

使用 Apache Sedona 进行云原生地理空间分析 (Chinese Edition)

by Pawel Tokaj, Jia Yu, Mo Sarwat
December 2025
Beginner to intermediate
325 pages
4h 23m
Chinese
O'Reilly Media, Inc.
Content preview from 使用 Apache Sedona 进行云原生地理空间分析 (Chinese Edition)

第八章: 使用 Apache Parquet 和 Apache Iceberg构建 地理空间数据湖仓

本作品已使用人工智能进行翻译。欢迎您提供反馈和意见:translation-feedback@oreilly.com

地理空间数据格式的发展一直落后于2010年代中期兴起的数据分析技术。Apache Parquet和Apache ORC均缺乏对地理空间数据的原生支持。虽然人们使用WKB等二进制格式在Apache Parquet中存储地理空间数据,但这未能充分发挥Apache Parquet格式的核心优势之一:基于统计数据的高效跳过机制。

2021年末,为解决数据湖缺乏地理空间支持的问题,GeoParquet文件格式规范应运而生。同期Apache Iceberg获得广泛关注——这种表格式解决了数据分析领域的多个痛点,包括事务缺失、部分故障处理及模式演进问题。数据湖仓时代由此开启。 然而,尽管Apache Iceberg、Delta Lake和Apache Hudi等文件格式日益流行,当时均未支持地理空间数据。得益于地理空间社区的努力,2025年初Apache Parquet和Apache Iceberg相继引入地理空间数据类型。 但整个生态系统仍需适应并实现新规范的数据读写器。Apache Sedona是该领域的先驱之一,您很快就能通过它加载和存储Apache Parquet与Apache Iceberg格式数据。待所有集成就绪后,我们将更新本书的代码库。

数据湖仓架构概述

数据湖与数据湖屋的核心差异在于 开放式表格式(如Apache Iceberg和Delta Lake),这些格式引入了优化技术、事务处理和数据版本控制,使数据湖屋更具效率。若将数据湖比作纸质地图,数据湖屋便是GPS导航系统。地图是人类使用数千年的优秀工具,但规划最优路线(尤其长途旅行)既耗时又令人头疼。 而导航系统则能自动提供从起点到终点的优化路线,结合实时路况与个人偏好,让旅程更轻松安全。

本书并非数据建模指南;数据湖仓架构本质是数据工具化而非建模方法。我们强烈推荐其他关于数据湖仓数据建模的优秀著作。1

数据湖屋的核心组件是查询引擎,可协助数据加载与转换。统一的文件格式和表规范使其可互换使用,用户不再受限于单一工具,能根据具体场景选择最佳方案。空间数据湖屋的组件如图8-1所示

Diagram of a data lakehouse architecture highlighting the separation of storage and compute, with storage using object storage, open file formats, and table formats, and compute involving various query engines.
图8-1. 数据湖仓由 独立的存储与计算构成,采用对象存储(如S3、GCS)和开放文件格式(如Parquet、ORC)以优化查询性能和数据处理成本2

数据湖仓中最常见的设计模式之一是奖章架构,包含青铜层、白银层和黄金层三个层次。数据质量、结构和可靠性随层级递增:青铜层存储原始数据(如音频文件、图像及来自外部系统的非结构化文件);白银层存储经过清洗、结构化处理并测试的数据; 金层承载业务聚合数据,可直接供外部团队、BI工具或机器学习算法使用。以房地产行业数据产品为例:青铜层存储原始卫星图像,银层识别建筑物轮廓,金层则承载融合社区数据(如公共交通覆盖率)的建筑信息。

数据处理领域已取得重大突破,使其更易管理且更可靠。回溯2017年的数据生态,几乎无人关注数据质量或可观测性,多数人困扰于其他问题,比如如何通过权限控制确保Apache Hadoop的稳定运行。 然而即便历经多年发展,真正受益于地理空间数据管理者仍寥寥无几。直至近期,数据表仍无法定义几何或栅格类型;元数据缺乏位置信息以加速加载。本质上,Apache ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

What Successful Brick-and-Mortar Retailers Get Right

What Successful Brick-and-Mortar Retailers Get Right

Rob Angell
A Five-Step Guide to Improving Your Employer Brand

A Five-Step Guide to Improving Your Employer Brand

Kimberly A. Whitler, Richard Mosley
Three Essentials for Agentic AI Security

Three Essentials for Agentic AI Security

Paolo Dal Cin, Daniel Kendzior, Yusof Seedat, Renato Marinho
What Successful Project Managers Do

What Successful Project Managers Do

W. Scott Cameron, Jeffrey S. Russell, Edward J. Hoffman, Alexander Laufer

Publisher Resources

ISBN: 0642572292300