Skip to Content
精通特征工程
book

精通特征工程

by Alice Zheng, Amanda Casari
April 2019
Intermediate to advanced
172 pages
4h 39m
Chinese
Posts & Telecom Press
Content preview from 精通特征工程
129
9
回到特征:建立学术论文推荐器
“在数学中,你不是去理解事情,你只是习惯它们。”
——约翰
诺依曼
当第一次看到图
1-1
中从数据到结果的路径时,很可能会无所适从。纵贯本书,我们的重
点在于介绍特征工程的基本原则,我们使用的是玩具模型和简单明了的数据集,这些例子
是有意设计成有说明性和启发性的。
机器学习的例子通常展示最理想的情况和最佳结果,这掩盖了本书中描述的路径中的艰
辛。既然基础已经打好,我们就离开模拟数据的简单世界,投入到使用真实的、结构化数
据集的特征工程中。在前进的每个阶段中,我们都会研究如何从原始数据生成特征,如何
进行特征转换,以及特征工程中需要何种权衡取舍。
先说一下,这个综合示例的目标不是为数据集建立最好的模型,而是演示一下本书中几种
技术的实际应用,以及如何更加深入地研究一下各种技术是否可以为建模过程提供价值。
9.1
 基于项目的协同过滤
我们的任务是使用微软学术图谱的一个子样本为学术论文建立一个推荐器。这个推荐器对
于那些需要搜索论文引用但还没有发现
Google Scholar
的人来说是非常方便的。下面是关
于这个数据集的一些统计量。
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

精通機器學習

精通機器學習

Aurélien Géron

Publisher Resources

ISBN: 9787115509680