Skip to Content
Spark机器学习实战
book

Spark机器学习实战

by Posts & Telecom Press, Siamak Amirghodsi, Meenakshi Rajendran, Broderick Hall, Shuen Mei
May 2024
Beginner to intermediate
549 pages
8h 11m
Chinese
Packt Publishing
Content preview from Spark机器学习实战

第13章 Spark Streaming和机器学习库

在这一章中,我们将讨论以下内容:

  • 用于近实时机器学习的Structured streaming;
  • 用于实时机器学习的流式DataFrame;
  • 用于实时机器学习的流式Dataset;
  • 流式数据和用于调试的queueStream;
  • 下载并熟悉著名的Iris数据,用于无监督分类;
  • 用于实时在线分类器的流式KMeans;
  • 下载葡萄酒质量数据,用于流式回归;
  • 用于实时回归的流式线性回归;
  • 下载Pima糖尿病数据,用于监督分类;
  • 用于在线分类器的流式逻辑回归。

Spark Streaming正朝着构建一个统一和结构化的API不断演变,以解决批处理与流式的问题。在Spark 1.3发布 Discretized Stream(DStream)之后,Spark streaming已经实际可用。现在新的发展方向是使用无界限的表模型抽象底层框架,使用户可以使用SQL或函数式编程对表进行查询,并能以多种模式(全量、增量和追加输出)将输出写入另一个输出表。Spark SQL Catalyst优化器和Tungsten(堆外内存管理器)现在已经集成为Spark Streaming的内部组件,可以让Spark程序高效地执行。

在这一章中,我们不仅介绍Spark机器库中现有的流式工具,还会包含4个有指导作用的攻略,我们发现这些攻略对更好地理解Spark 2.0非常有用。

图13-1描述了本章的整体内容。

图片 1

图13-1

Spark 2.0+基于前面成功的版本开发得来,抽象了框架的一些内部工作原理,在提供给开发人员使用时,程序员不必担心重新编写一次性语义的代码。现在的流式计算已经从基于RDD的DStream发展到结构化流式(structured ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

TensorFlow深度学习项目实战

TensorFlow深度学习项目实战

Posts & Telecom Press, Luca Massaron, Alberto Boschetti, Alexey Grigorev, Abhishek Thakur
Python和NLTK实现自然语言处理

Python和NLTK实现自然语言处理

Posts & Telecom Press, Nitin Hardeniya
Python计算机视觉和自然语言处理

Python计算机视觉和自然语言处理

Posts & Telecom Press, Álvaro Morena Alberolaï, Gonzalo Molina Gallegoï, Unai Garay Maestreï
数据科学实战手册

数据科学实战手册

Posts & Telecom Press, Tony Ojeda, Sean Patrick Murphy, Bengfort Benjamin

Publisher Resources

ISBN: 9781836201830