Skip to Content
Spark机器学习实战
book

Spark机器学习实战

by Posts & Telecom Press, Siamak Amirghodsi, Meenakshi Rajendran, Broderick Hall, Shuen Mei
May 2024
Beginner to intermediate
549 pages
8h 11m
Chinese
Packt Publishing
Content preview from Spark机器学习实战

第2章 Spark机器学习中的线性代数库

在本章中,我们将讲解下面的攻略:

  • Vector和Matrix的包引入和初始化设置;
  • 使用Spark 2.0创建和配置DenseVector;
  • 使用Spark 2.0创建和配置SparseVector;
  • 使用Spark 2.0创建和配置DenseMatrix;
  • 使用Spark 2.0的本地SparseMatrix;
  • 使用Spark 2.0进行Vector运算;
  • 使用Spark 2.0进行Matrix运算;
  • 研究Spark 2.0 分布式RowMatrix;
  • 研究Spark 2.0分布式IndexedRowMatrix;
  • 研究Spark 2.0分布式CoordinateMatrix;
  • 研究Spark 2.0分布式BlockMatrix。

线性代数是机器学习(Machine Learning,ML)和数学规划(Mathematical Programming,MP)的基础。在使用Spark机器学习库时,需要清楚知道Scala(默认导入)的Vector/Matrix和Spark的ML、MLlib Vector、Matrix是不同的,后者是使用Spark(并行计算)解决大规模Matrix/Vector计算(例如,在某些领域运用衍生品定价和风险分析需要更高数值精度的SVD实现方案)时用到的一种由RDD所支持的数据结构。Scala的Vector/Matrix函数库提供了一套丰富的线性代数操作(例如点积、加法等),在机器学习管道的应用中仍然占有一席之地。总而言之,使用Scala Breeze和Spark/Spark ML的关键区别在于Spark由RDD所支持,在不需要额外的并发模块或其他开销时,同时支持分布式、并发计算和弹性机制。

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

TensorFlow深度学习项目实战

TensorFlow深度学习项目实战

Posts & Telecom Press, Luca Massaron, Alberto Boschetti, Alexey Grigorev, Abhishek Thakur
Python和NLTK实现自然语言处理

Python和NLTK实现自然语言处理

Posts & Telecom Press, Nitin Hardeniya
Python计算机视觉和自然语言处理

Python计算机视觉和自然语言处理

Posts & Telecom Press, Álvaro Morena Alberolaï, Gonzalo Molina Gallegoï, Unai Garay Maestreï
数据科学实战手册

数据科学实战手册

Posts & Telecom Press, Tony Ojeda, Sean Patrick Murphy, Bengfort Benjamin

Publisher Resources

ISBN: 9781836201830