book

Kudu：构建高性能实时数据分析存储系统

Name: Kudu：构建高性能实时数据分析存储系统
ISBN: 9787121295416

by Jean-Marc Spaggiari, Mladen Kovacevic, Brock Noland, Ryan Bosshart

March 2019

Intermediate to advanced

201 pages

3h 22m

Chinese

Publishing House of Electronics Industry

Read now

Unlock full access

Content preview from Kudu：构建高性能实时数据分析存储系统

｜

第

章

为什么会有

Kudu

线就会变得模糊。比如，使用列数据库支持

OLTP

负载的一个大问题是，

OLTP

负载通常会读取一行的大部分字段，这在列存储中就意味着需要做很

多次磁盘寻址，不过使用

SSD

或者可持久的内存基本上可以避免这个问题。

与大数据组件对比——

HDFS

、

HBase

和

Cassandra

我们了解这些以后，

如何将

Kudu

与其他大数据存储系统（比如

HDFS

、

HBase

、

Cassandra

）做比较呢？让我们先来看看这些系统的优势在哪里。

HDFS

非常擅长扫描大量的数据，也即它的“全表扫描”非常出色，这是

分析类负载中很常见的操作。

HBase

和

Cassandra

很擅长随机访问，随机读

取或者修改数据。

HDFS

不擅长随机读，并且严格来说，它并不支持随机写，

不过你可以通过合并的方式来模拟随机写，但是这么做成本很高。

HBase

和

Cassandra

在大规模扫描方面的性能比

HDFS

差得多。

Kudu

的目标是把

扫描的性能做到

HDFS

上的

Parquet

的两倍以内，而随机读的性能则要接

近

HBase

和

Cassandra

。实际的性能目标是在

SSD

上做到随机读

写的延

迟在

1 ms

以内。

我们来逐一详细解释为什么

HDFS

、

HBase

、

Cassandra

的性能特性会是

这样子的。

HDFS

是一个纯粹的分布式文件系统，其就是设计来执行快速

的大规模扫描的，毕竟这种设计的原始应用场景就是批量构建

Web

索引。

对于该场景和许多其他场景，你只需要能够高效地扫描整个数据集即可。

HDFS

对数据分区，并将其分散到大量物理磁盘上，以便这些大扫描可以 ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9787121295416

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Kudu：构建高性能实时数据分析存储系统

by Jean-Marc Spaggiari, Mladen Kovacevic, Brock Noland, Ryan Bosshart

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.