Skip to Content
For Enterprise
For Government
For Higher Ed
For Individuals
For Marketing
For Enterprise
For Government
For Higher Ed
For Individuals
For Marketing
Explore Skills
Cloud Computing
Microsoft Azure
Amazon Web Services (AWS)
Google Cloud
Cloud Migration
Cloud Deployment
Cloud Platforms
Data Engineering
Data Warehouse
SQL
Apache Spark
Microsoft SQL Server
MySQL
Kafka
Data Lake
Streaming & Messaging
NoSQL Databases
Relational Databases
Data Science
Pandas
R
MATLAB
SAS
D3
Power BI
Tableau
Statistics
Exploratory Data Analysis
Data Visualization
AI & ML
Generative AI
Machine Learning
Artificial Intelligence (AI)
Deep Learning
Reinforcement Learning
Natural Language Processing
TensorFlow
Scikit-Learn
Hyperparameter Tuning
MLOps
Programming Languages
Java
JavaScript
Spring
Python
Go
C#
C++
C
Swift
Rust
Functional Programming
Software Architecture
Object-Oriented
Distributed Systems
Domain-Driven Design
Architectural Patterns
IT/Ops
Kubernetes
Docker
GitHub
Terraform
Continuous Delivery
Continuous Integration
Database Administration
Computer Networking
Operating Systems
IT Certifications
Security
Network Security
Application Security
Incident Response
Zero Trust Model
Disaster Recovery
Penetration Testing / Ethical Hacking
Governance
Malware
Security Architecture
Security Engineering
Security Certifications
Design
Web Design
Graphic Design
Interaction Design
Film & Video
User Experience (UX)
Design Process
Design Tools
Business
Agile
Project Management
Product Management
Marketing
Human Resources
Finance
Team Management
Business Strategy
Digital Transformation
Organizational Leadership
Soft Skills
Professional Communication
Emotional Intelligence
Presentation Skills
Innovation
Critical Thinking
Public Speaking
Collaboration
Personal Productivity
Confidence / Motivation
Features
All features
Verifiable skills
AI Academy
Courses
Certifications
Interactive learning
Live events
Superstreams
Answers
Insights reporting
Radar Blog
Buy Courses
Plans
Sign In
Try Now
O'Reilly Platform
book
Presto实战
by
Matt Fuller
,
Manfred Moser
,
Martin Traverso
March 2021
Intermediate to advanced
265 pages
6h 50m
Chinese
Posts & Telecom Press
Content preview from
Presto实战
连接器
|
87
因此,位于
s3://example-org/page_views
中的数据可能已经存在。一旦在
Presto
中创建了
表,你就可以开始查询它了。当你将
Hive
连接器配置到现有的
Hive
仓库中时
,可以看到
现有的表,并且能够立即对这些表进行查询。
或者,你可以在空的目录中创建表,并期望数据被
Presto
或外部源加载进来
。在这两种情
况下,
Presto
都要求已经创建了目录结构
;否则,
DDL
会出错。创建外部表最常见的场景
是与其他工具共享数据时。
6.4.5
分区数据
目前,你已经了解了一个表的数据,不管是内部的还是外部的,都是以一个或多个文件的
形式存储在一个目录中。
数据分区
是这一点的延伸,它将逻辑表横向划分为小块数据,称
为分区。
这个概念本身源于
RDBMS
中的分区
schema
。
Hive
将这种技术引入
HDFS
中的数据
,用
于实现更好的查询性能并提升数据的可管理性。
在分布式文件系统(如
HDFS
)和对象存储(如
S3
)中,分区已成为标准的数据组织策略。
让我们用这个表的例子来演示一下分区:
CREATE TABLE hive.web.page_views (
view_time timestamp,
user_id bigint,
page_url varchar,
view_date date
)
WITH (
partitioned_by = ARRAY['view_date']
)
partitioned_by
子句中列出的列必须是
DDL
中定义的最后一列,否则,
Presto
会报错。
与非分区表一样, ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial
You might also like
大数据项目管理:从规划到实现
Ted Malaska, Jonathan Seidman
数据库系统内幕
Alex Petrov
云原生:运用容器、函数计算和数据构建下一代应用
Boris Scholl, Trent Swanson, Peter Jausovec
Google系统架构解密: 构建安全可靠的系统
Heather Adkins, Betsy Beyer, Paul Blankinship, Piotr Lewandowski, Ana Oprea, Adam Stubblefield
Publisher Resources
ISBN: 9787115560056