Skip to Content
Java数据分析指南
book

Java数据分析指南

by Posts & Telecom Press, John R. Hubbard
May 2024
Intermediate to advanced
347 pages
5h 38m
Chinese
Packt Publishing
Content preview from Java数据分析指南

第11章  Java大数据分析

“在拓荒时代,人们使用牛来牵拉重物,而如果一头牛拉不动圆木,人们并不会通过养一头更大的牛来解决这个问题。同样的,我们不应该通过更大的计算机来解决复杂的问题,而应该通过使用更多的计算机系统来解决问题。”

——Grace Hopper (1906—1992)

大数据这个术语通常指存储、检索以及分析大量数据集使用的算法,这些数据集太大,不能通过单个文件服务器管理。在商业上,这些算法是由Google首创的,本章也会考察它们的两个早期基准算法:PageRank和MapReduce。

提示.tif 

在20世纪30年代,美国数学家爱德华•卡斯纳(Edward Kasner)的九岁侄子创造了单词“googol”,这个词表示10100。在那个年代,宇宙中的粒子数目据估计大约是1080。此后,Kasner又创造了一个新词汇“googolplex”,用来代表10google。这个数字可以写成1后面跟着10100个零。谷歌在加利福尼亚山景的总部就叫Googleplex。

关系数据库(Rdb)不擅长超大型数据库的管理。我们在第10章“NoSQL数据库”中看到过,这也是开发NoSQL数据库的一个主要原因。

一般来说,有两种方法来管理不断增长的大型数据集:垂直扩展和水平扩展。垂直扩展指的是增加单个服务器容量的策略,手段是升级到更有力的CPU,更多的主存储器以及更多的存储空间。水平扩展指的是通过增加系统中的服务器的数量来重新分配数据集。垂直扩展的优点是,它不需要对现有的软件进行任何明显更新,主要的缺点是比水平扩展更严格。水平扩展的主要问题是它确实要求软件上的调整。但是如我们所见,像MapReduce这样的框架使许多调整易于管理。 ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

Spark机器学习实战

Spark机器学习实战

Posts & Telecom Press, Siamak Amirghodsi, Meenakshi Rajendran, Broderick Hall, Shuen Mei
Python实用技能学习指南

Python实用技能学习指南

Posts & Telecom Press, Robert Smallshire, Austin Bingham
Python计算机视觉和自然语言处理

Python计算机视觉和自然语言处理

Posts & Telecom Press, Álvaro Morena Alberolaï, Gonzalo Molina Gallegoï, Unai Garay Maestreï
Python和NLTK实现自然语言处理

Python和NLTK实现自然语言处理

Posts & Telecom Press, Nitin Hardeniya

Publisher Resources

ISBN: 9781836201052