Skip to Content
Presto实战
book

Presto实战

by Matt Fuller, Manfred Moser, Martin Traverso
March 2021
Intermediate to advanced
265 pages
6h 50m
Chinese
Posts & Telecom Press
Content preview from Presto实战
连接器
87
因此,位于
s3://example-org/page_views
中的数据可能已经存在。一旦在
Presto
中创建了
表,你就可以开始查询它了。当你将
Hive
连接器配置到现有的
Hive
仓库中时
,可以看到
现有的表,并且能够立即对这些表进行查询。
或者,你可以在空的目录中创建表,并期望数据被
Presto
或外部源加载进来
。在这两种情
况下,
Presto
都要求已经创建了目录结构
;否则,
DDL
会出错。创建外部表最常见的场景
是与其他工具共享数据时。
6.4.5
 分区数据
目前,你已经了解了一个表的数据,不管是内部的还是外部的,都是以一个或多个文件的
形式存储在一个目录中。
数据分区
是这一点的延伸,它将逻辑表横向划分为小块数据,称
为分区。
这个概念本身源于
RDBMS
中的分区
schema
Hive
将这种技术引入
HDFS
中的数据
,用
于实现更好的查询性能并提升数据的可管理性。
在分布式文件系统(如
HDFS
)和对象存储(如
S3
)中,分区已成为标准的数据组织策略。
让我们用这个表的例子来演示一下分区:
CREATE TABLE hive.web.page_views (
view_time timestamp,
user_id bigint,
page_url varchar,
view_date date
)
WITH (
partitioned_by = ARRAY['view_date']
)
partitioned_by
子句中列出的列必须是
DDL
中定义的最后一列,否则,
Presto
会报错。
与非分区表一样, ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

大数据项目管理:从规划到实现

大数据项目管理:从规划到实现

Ted Malaska, Jonathan Seidman
机器学习流水线实战

机器学习流水线实战

Hannes Hapke, Catherine Nelson

Publisher Resources

ISBN: 9787115560056