Skip to Content
大规模数据分析和建模:基于 Spark 与 R
book

大规模数据分析和建模:基于 Spark 与 R

by Javier Luraschi, Kevin Kuo, Edgar Ruiz
July 2020
Intermediate to advanced
262 pages
5h 34m
Chinese
China Machine Press
Content preview from 大规模数据分析和建模:基于 Spark 与 R
分布式
R
197
11.6 context
参数
如果要使用 spark_apply() 处理分区,可能需要包括足以放入每个节点的小型备用数
据。网格搜索的例子中就是这样,数据集被传递到所有分区,并且自身保持未分区状态。
我们可以在本章中修改初始的 f(x) = 10 * x 示例来定制乘法器。它最初设置为 10
但我们可以将其指定为 context 参数进行配置:
sdf_len(sc, 4) %>%
spark_apply(
function(data, context) context * data,
context = 100
)
# Source: spark<?> [?? x 1]
id
<dbl>
1 100
2 200
3 300
4 400
11-5 展示了这个例子的概念图。注意到数据分区还是可用的,但是 context 参数被
分配给了所有节点。
数据 (1)
context context
数据 (2)
工作节点 (1) 工作节点 (2)
11-5
:乘以
context
参数的
map
操作
网格搜索示例使用这个参数将 DataFrame 传递给每个工作节点;但是,由于 context
参数被序列化为 R 对象,因此它可以包含任何内容。例如,如果需要传递多个值甚至多
个数据集,你甚至可以传递包含值的列表。
下面的例子定义了一个 f(x) = m * x + b 函数,并运行 m = 10 b = 2
sdf_len(sc, 4) %>%
spark_apply(
~.y$m * .x + .y$b, ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

机器学习实战:基于Scikit-Learn、Keras 和TensorFlow (原书第2 版)

机器学习实战:基于Scikit-Learn、Keras 和TensorFlow (原书第2 版)

Aurélien Géron
数字化转型:企业破局的34 个锦囊

数字化转型:企业破局的34 个锦囊

Gary O’Brien, Xiao Guo, Mike Mason

Publisher Resources

ISBN: 9787111661016