Skip to Content
Spark快速大数据分析(第2版)
book

Spark快速大数据分析(第2版)

by Jules S. Damji, Brooke Wenig, Tathagata Das, Denny Lee
November 2021
Intermediate to advanced
340 pages
10h 46m
Chinese
Posts & Telecom Press
Content preview from Spark快速大数据分析(第2版)
22
2
所有通过高层的结构化数据
API
表达的计算都会被分解而生成优化好的低层
RDD
操作
,然后转为
Scala
字节码,以发给执行器的
JVM
。这些生成的代码
对用户是不可见的,与用户能接触到的
RDD API
也是不同的。
2.3
 第
3
理解
Spark
应用的相关概念
至此,我们已经在笔记本计算机上下载并安装了
Spark
,启动了
Spark shell
并以交互方式
运行了一些简短的示例代码,接下来可以进入最后一步了。
要想理解示例代码到底在内部做了什么事情,首先需要熟悉
Spark
应用的一些关键概念
以及代码是如何转化为
Spark
执行器上的任务并执行的。我们先了解一些重要的术语定义。
应用
使用
Spark
API
构建的基于
Spark
的用户程序
。它由一个驱动器程序和集群内的多个
执行器组成。
SparkSession
SparkSession
对象提供与下层
Spark
功能交互的入口。它允许用户用
Spark
API
编写
Spark
程序
。在交互式
Spark shell
中,
Spark
驱动器已经初始化了一个
SparkSession
象,但在
Spark
应用中,你需要自行创建
SparkSession
对象。
作业
由许多任务组成的并行计算,在调用
Spark
行动操作(如
save()
collect()
等)时生成。
执行阶段
每个作业会被分为更小的任务集合,即执行阶段。执行阶段之间会存在依赖关系。
任务
发送到
Spark
执行器上的具体工作的独立单元。
我们将继续深入探讨这些概念。
2.3.1
 
Spark
应用与
SparkSession
所有
Spark ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

数据驱动力:企业数据分析实战

数据驱动力:企业数据分析实战

Carl Anderson
数据压缩入门

数据压缩入门

Colt McAnlis, Aleks Haecky
解密金融数据

解密金融数据

Justin Pauley

Publisher Resources

ISBN: 9787115576019