Skip to Content
Spark机器学习实战
book

Spark机器学习实战

by Posts & Telecom Press, Siamak Amirghodsi, Meenakshi Rajendran, Broderick Hall, Shuen Mei
May 2024
Beginner to intermediate
549 pages
8h 11m
Chinese
Packt Publishing
Content preview from Spark机器学习实战

第6章 用Spark 2.0实践机器学习中的回归和分类——第二部分

在这一章,将讨论以下内容:

  • Spark 2.0使用SGD优化的线性回归;
  • Spark 2.0使用SGD优化的逻辑回归;
  • Spark 2.0使用SGD优化的岭回归;
  • Spark 2.0使用SGD优化的Lasso回归;
  • Spark 2.0使用L-BFGS优化的逻辑回归;
  • Spark 2.0的支持向量机(SVM);
  • Spark 2.0使用MLlib库的朴素贝叶斯分类器;
  • Spark 2.0使用逻辑回归研究ML管道和DataFrame。

这一章将重点介绍Spark 2.0中回归和分类内容的第二部分——基于RDD的回归,这些算法在许多现有的Spark机器学习实现中都有应用。现在既然存在这个代码库,那么不论中级还是高级从业者都应该能够使用这些技术。

在本章中,我们将通过Apache Spark API使用带有随机梯度下降(SGD)和L-BFGS优化的各种回归算法(线性回归、逻辑回归、岭回归和Lasso回归)和功能强大的线性分类算法(例如支持向量机SVM和朴素贝叶斯)学习实现一个简单的应用。我们对每个攻略补充样本拟合的度量指标(例如MSE、RMSE、ROC、二分类和多分类指标)来讲解Spark MLlib的功能和完整内容。首先介绍基于RDD的线性回归、逻辑回归、岭回归和Lasso回归,然后使用SVM和朴素贝叶斯来介绍更复杂的分类器。

图6-1描述了本章所覆盖的回归和分类算法。

图片 1

图6-1

 

提示

在实际应用中,使用带有SGD的回归算法存在一些问题,但是这些问题很可能是因为大型参数系统对SGD的优化不合理,也可能是没有正确理解SGD优化技术的优缺点。 ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

TensorFlow深度学习项目实战

TensorFlow深度学习项目实战

Posts & Telecom Press, Luca Massaron, Alberto Boschetti, Alexey Grigorev, Abhishek Thakur
Python和NLTK实现自然语言处理

Python和NLTK实现自然语言处理

Posts & Telecom Press, Nitin Hardeniya
Python计算机视觉和自然语言处理

Python计算机视觉和自然语言处理

Posts & Telecom Press, Álvaro Morena Alberolaï, Gonzalo Molina Gallegoï, Unai Garay Maestreï
数据科学实战手册

数据科学实战手册

Posts & Telecom Press, Tony Ojeda, Sean Patrick Murphy, Bengfort Benjamin

Publisher Resources

ISBN: 9781836201830