Skip to Content
Presto实战
book

Presto实战

by Matt Fuller, Manfred Moser, Martin Traverso
March 2021
Intermediate to advanced
265 pages
6h 50m
Chinese
Posts & Telecom Press
Content preview from Presto实战
连接器
91
• TEXTFILE
• RCTEXT
• RCBINARY
• CSV
• SEQUENCEFILE
Presto
使用的三种最常用的文件格式是
ORC
Parquet
A
vro
数据文件。
ORC
Parquet
RCText
RCBinary
格式的读取组件在
Presto
中做了大量的性能优化。
HMS
的元数据包含文件格式信息,以便
Presto
知道在读取数据文件时使用什么读取组
件。在
Presto
中创建一个表时
,数据类型被默认设置为
ORC
,但可以在
CREATE TABLE
句中的
WITH
属性里覆盖默认的数据类型:
CREATE TABLE hive.web.page_views (
view_time timestamp,
user_id bigint,
page_url varchar,
ds_date,
country varchar
)
WITH (
format = 'ORC'
)
catalog
中所有表的默认存储格式可以通过
catalog
属性文件中的
hive.storage-format
配置
来设置。
默认情况下,
Presto
使用
GZIP
压缩编解码器来编写文件
。可通过在
catalog
属性文件中设
hive.compression-codec
配置,将代码更改为使用
SNAPPY
NONE
6.4.8
 
MinIO
示例
MinIO
是一个兼容
S3
的轻量级分布式存储系统,你可以使用
Presto
Hive
连接器来访问
它。如果你想更详细地了解它的使用,可以查看我们的示例项目。 ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

大数据项目管理:从规划到实现

大数据项目管理:从规划到实现

Ted Malaska, Jonathan Seidman
机器学习流水线实战

机器学习流水线实战

Hannes Hapke, Catherine Nelson

Publisher Resources

ISBN: 9787115560056