Skip to Content
Spark机器学习实战
book

Spark机器学习实战

by Posts & Telecom Press, Siamak Amirghodsi, Meenakshi Rajendran, Broderick Hall, Shuen Mei
May 2024
Beginner to intermediate
549 pages
8h 11m
Chinese
Packt Publishing
Content preview from Spark机器学习实战

第7章 使用Spark实现大规模的推荐引擎

在这一章,将讨论以下内容:

  • 使用Spark 2.0生成可扩展推荐引擎所需的数据;
  • 使用Spark 2.0研究推荐系统的电影数据;
  • 使用Spark 2.0研究推荐系统的评分数据;
  • 使用Spark 2.0和协同过滤构建可扩展的推荐引擎。

在前面的章节中,我们已经使用简短攻略和非常简化的代码来演示Spark机器学习库中的基本构建块和概念。本章将介绍一个更加成熟高级的应用程序,使用Spark API和工具来处理特定的机器学习领域的问题。尽管本章的攻略数目较少,但是会学习到更多的机器学习应用程序。

本章将使用一种基于隐因子模型(可选最小二乘(ALS)的矩阵分解技术探究推荐系统及其实现。简而言之,当尝试将一个很大的“用户—项目评分矩阵”分解为2个更低秩、更瘦扁的矩阵时,经常遇到难以处理的非线性或非凸优化问题。碰巧,我们非常善于解决凸优化问题:先固定问题的某一部分,再从局部去解决问题的其他部分,继而来回重复多次(因此称为“交替”)。可以使用已有的并行优化技术更好地解决因子分解(可以发现一组隐因子)问题。

本章将使用一个流行的数据集(MovieLens数据集)来实现推荐引擎,和其他章节不同的是:这里使用2个攻略来探索数据并展示如何将JFreeChart等图形单元添加进Spark机器学习工具库中。

图7-1展示了本章所涉及的概念和攻略流程,该流程演示一个ALS(可选最小二乘)推荐应用。

图片 1

图7-1

推荐引擎已经存在很长时间,并且在20世纪90年代的早期电子商务系统中广泛使用,技术范围从硬编码产品关联延伸到由概要分析驱动的基于内容的推荐。现代推荐系统使用协作过滤(CF)来解决早期系统的问题,并解决现代商务系统(例如,亚马逊、奈飞、易贝、News等)竞争中所面对的规模和延迟(例如最大100毫秒或更短)问题。 ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

TensorFlow深度学习项目实战

TensorFlow深度学习项目实战

Posts & Telecom Press, Luca Massaron, Alberto Boschetti, Alexey Grigorev, Abhishek Thakur
Python和NLTK实现自然语言处理

Python和NLTK实现自然语言处理

Posts & Telecom Press, Nitin Hardeniya
Python计算机视觉和自然语言处理

Python计算机视觉和自然语言处理

Posts & Telecom Press, Álvaro Morena Alberolaï, Gonzalo Molina Gallegoï, Unai Garay Maestreï
数据科学实战手册

数据科学实战手册

Posts & Telecom Press, Tony Ojeda, Sean Patrick Murphy, Bengfort Benjamin

Publisher Resources

ISBN: 9781836201830