Skip to Content
Spark快速大数据分析(第2版)
book

Spark快速大数据分析(第2版)

by Jules S. Damji, Brooke Wenig, Tathagata Das, Denny Lee
November 2021
Intermediate to advanced
340 pages
10h 46m
Chinese
Posts & Telecom Press
Content preview from Spark快速大数据分析(第2版)
290
11
近实时处理
如果使用场景要求预测结果延时在几百毫秒到几秒这个级别,那么可以构建使用
MLlib
的预测服务器来生成预测结果。虽然这不是
Spark
理想的使用场景,因为每次
只会处理很少量的数据,但这比流处理解决方案或批处理解决方案的延迟要低很多。
11.2.3
 导出模型用于实时预测的模式
有些领域需要做到实时预测,比如反欺诈、广告推荐之类的使用场景。虽然为少量记录计
算预测结果可能满足实时预测接口提出的低延迟要求,但我们还需要搞定负载均衡(处理
大量并发请求),在时效非常重要的任务中还要考虑地理位置。一些常用的托管解决方案
(如
A
WS SageMaker
Azure ML
)能提供低延迟的模型服务解决方案。本节会展示如何导
MLlib
模型,以便将模型部署到其他服务。
将模型导出
Spark
方式之一是用
Python
C
等语言重新原生实现模型。虽然提取模型
的系数看起来还算简单,但同时要导出的所有特征工程和预处理步骤(
OneHotEncoder
VectorAssembler
等)会将问题变得很复杂,而且容易出错。
一些开源库(如
MLeap
ONNX
)可以帮助自动导出支持的部分
MLlib
模型
,并消除这
些模型对
Spark
的依赖
。但在撰写本书时,开发
MLeap
的公司已经不再支持这款产品了。
MLeap
也还未支持
Scala 2.12
Spark 3.0
ONNX
Open
Neural Network Exchange
)则已经成为机器学习互操作性方面的开放标
准。你可能知道其他的一些机器学习互操作格式 ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

Go程序设计语言

Go程序设计语言

艾伦A. A.多诺万, 布莱恩W. 柯尼汉
数据压缩入门

数据压缩入门

Colt McAnlis, Aleks Haecky
解密金融数据

解密金融数据

Justin Pauley

Publisher Resources

ISBN: 9787115576019