Skip to Content
机器学习流水线实战
book

机器学习流水线实战

by Hannes Hapke, Catherine Nelson
November 2021
Intermediate to advanced
302 pages
8h 57m
Chinese
Posts & Telecom Press
Content preview from 机器学习流水线实战
224
13
在这一点上,至关重要的是要稳健地设置此类流水线。仅当新数据的涌入导致数据统计信
息超出数据验证中设置的限制,或者导致模型统计信息超出模型分析中设置的边界时,才
会导致流水线发生故障。然后,这可以触发诸如模型重新训练、新特征工程等事件。如果
这些触发器之一被触发,则新模型应该收到一个新的版本号。
除了收集新的训练数据,反馈循环还可以提供有关模型实际使用情况的信息。这可能包括
活跃用户的数量、他们与之交互的时间以及许多其他数据。这类数据对于向业务干系人证
明模型的价值非常有用。
反馈循环可能很危险
反馈循环也可能带来负面影响,应谨慎对待。如果在没有人工输入的情况下
将模型的预测重新输入到新的训练数据中,那么该模型将既从其正确的预测
中学习又从其错误的预测中学习。反馈循环还可能放大原始数据中存在的任
何偏差或不公平现象。仔细的模型分析可以帮助你发现其中的一些情况。
13.1
 显式反馈和隐式反馈
可以将反馈分为两种主要类型:显式反馈和隐式反馈。
1
显式
反馈是用户对预测的一些直
接的输入,例如,对推荐系统的购物或观影推荐给予点赞(竖起大拇指)或差评大拇指
向下),或者更正预测。
隐式
反馈是人们在正常使用产品时的行为为模型提供反馈,例如,
购买推荐系统推荐的东西或观看推荐的电影。用户隐私需要通过隐式反馈进行仔细考虑,
因为它很容易跟踪用户采取的每项操作。
13.1.1
 数据飞轮
在有些情况下,你可能拥有了创立基于机器学习的新产品所需的所有数据。但是在其他情
况下,你可能需要收集更多的数据。在处理监督学习问题时 ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

大数据项目管理:从规划到实现

大数据项目管理:从规划到实现

Ted Malaska, Jonathan Seidman
可编程网络自动化

可编程网络自动化

Jason Edelman, Scott S. Lowe, Matt Oswalt
C++语言导学(原书第2版)

C++语言导学(原书第2版)

本贾尼 斯特劳斯特鲁普

Publisher Resources

ISBN: 9787115573216