Skip to Content
精通特征工程
book

精通特征工程

by Alice Zheng, Amanda Casari
April 2019
Intermediate to advanced
172 pages
4h 39m
Chinese
Posts & Telecom Press
Content preview from 精通特征工程
136
9
“是的,是的,”你可能会说,“但现在是大数据时代,这会解决我们的问题!我们难道不
能通过更多数据找出更好的结果吗?”可能会,但即使大数据也不能弥补糟糕的数据和特
征选择所造成的恶果。
这就是你的机器学习系统吗?
没错!你把数据倒在这个线性代数垃
圾堆中,然后去另一边收集答案。
那就一直翻这个垃圾堆,直
到它们看上去是正确的。
如果答案是错误的呢?
数据
答案
9-4:机器学习(https://xkcd.com/1838/
现在的暴力方法太慢了,远算不上是智能的、迭代的特征工程。下面试验一下新的特征工
程技术,看看是否能提高计算速度,找到更合适的特征和搜索结果的更好方式。
9.3
 第二关
更多特征工程和更智能的模型
最初的方法是创建一个巨大的、稀疏的数组,然后通过一个筛选器暴力求解。有多种方式
可以改进这种方法。下一步的重点是使用更好的技术来处理两个初始特征,并修改基于项
目的协同过滤方法来加快迭代。
首先,在假设中的两个变量上,试验一下本书介绍过的精彩的特征工程技巧。在更加深入
地研究了特征之后,我们可以选择那些适合每种变量的技术,将变量转换为适合推荐系统
的“更好”的特征。
学术论文推荐器
2
先看
出版年份
2.2.2
节中介绍了为什么使用原始计数作为特征不适合那些使用相似度度量
的方法。例
9-6
和图
9-5
会研究如何对
year
进行转换,以使它更加适合我们选择的模型。
回到特征:建立学术论文推荐器
137
9-6
 
等宽分箱
+
虚拟编码(第
1
部分)
>>> print("Year spread: ", ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

精通機器學習

精通機器學習

Aurélien Géron

Publisher Resources

ISBN: 9787115509680