Skip to Content
Kudu:构建高性能实时数据分析存储系统
book

Kudu:构建高性能实时数据分析存储系统

by Jean-Marc Spaggiari, Mladen Kovacevic, Brock Noland, Ryan Bosshart
March 2019
Intermediate to advanced
201 pages
3h 22m
Chinese
Publishing House of Electronics Industry
Content preview from Kudu:构建高性能实时数据分析存储系统
24
1
为什么会有
Kudu
线就会变得模糊。比如,使用列数据库支持
OLTP
负载的一个大问题是,
OLTP
负载通常会读取一行的大部分字段,这在列存储中就意味着需要做很
多次磁盘寻址,不过使用
SSD
或者可持久的内存基本上可以避免这个问题。
与大数据组件对比——
HDFS
HBase
Cassandra
我们了解这些以后,
如何将
Kudu
与其他大数据存储系统(比如
HDFS
HBase
Cassandra
)做比较呢?让我们先来看看这些系统的优势在哪里。
HDFS
非常擅长扫描大量的数据,也即它的“全表扫描”非常出色,这是
分析类负载中很常见的操作。
HBase
Cassandra
很擅长随机访问,随机读
取或者修改数据。
HDFS
不擅长随机读,并且严格来说,它并不支持随机写,
不过你可以通过合并的方式来模拟随机写,但是这么做成本很高。
HBase
Cassandra
在大规模扫描方面的性能比
HDFS
差得多。
Kudu
的目标是把
扫描的性能做到
HDFS
上的
Parquet
的两倍以内,而随机读的性能则要接
HBase
Cassandra
。实际的性能目标是在
SSD
上做到随机读
/
写的延
迟在
1 ms
以内。
我们来逐一详细解释为什么
HDFS
HBase
Cassandra
的性能特性会是
这样子的。
HDFS
是一个纯粹的分布式文件系统,其就是设计来执行快速
的大规模扫描的,毕竟这种设计的原始应用场景就是批量构建
Web
索引。
对于该场景和许多其他场景,你只需要能够高效地扫描整个数据集即可。
HDFS
对数据分区,并将其分散到大量物理磁盘上,以便这些大扫描可以 ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

Hadoop数据分析

Hadoop数据分析

Benjamin Bengfort, Jenny Kim
Go程序设计语言

Go程序设计语言

艾伦A. A.多诺万, 布莱恩W. 柯尼汉

Publisher Resources

ISBN: 9787121295416