Skip to Content
Python机器学习基础教程
book

Python机器学习基础教程

by Andreas C. Müller, Sarah Guido
January 2018
Intermediate to advanced
301 pages
8h 54m
Chinese
Posts & Telecom Press
Content preview from Python机器学习基础教程
4
数据表示与特征工程
到目前为止,我们一直假设数据是由浮点数组成的二维数组,其中每一列是描述数据点的
连续特征
continuous feature
)。对于许多应用而言,数据的收集方式并不是这样。一种特
别常见的特征类型就是
分类特征
categorical feature
),也叫
离散特征
discrete feature
)。
这种特征通常并不是数值。分类特征与连续特征之间的区别类似于分类和回归之间的区
别,只是前者在输入端而不是输出端。我们已经见过的连续特征的例子包括像素明暗程度
和花的尺寸测量。分类特征的例子包括产品的品牌、产品的颜色或产品的销售部门(图
书、服装、硬件)。这些都是描述一件产品的属性,但它们不以连续的方式变化。一件产
品要么属于服装部门,要么属于图书部门。在图书和服装之间没有中间部门,不同的分类
之间也没有顺序(图书不大于服装也不小于服装,硬件不在图书和服装之间,等等)。
无论你的数据包含哪种类型的特征,数据表示方式都会对机器学习模型的性能产生巨大影
响。我们在第
2
章和第
3
章中看到,数据缩放非常重要。换句话说,如果你没有缩放数据
(比如,缩放到单位方差),那么你用厘米还是英寸表示测量数据的结果将会不同。我们在
2
章中还看到,用额外的特征
扩充
augment
)数据也很有帮助,比如添加特征的交互
项(乘积)或更一般的多项式。
对于某个特定应用来说,如何找到最佳数据表示,这个问题被称为
特征工程
feature
engineering
),它是数据科学家和机器学习从业者在尝试解决现实世界问题时的主要任务之 ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

数据驱动力:企业数据分析实战

数据驱动力:企业数据分析实战

Carl Anderson
Python应用开发指南

Python应用开发指南

Posts & Telecom Press, Ninad Sathaye
管理Kubernetes

管理Kubernetes

Brendan Burns, Craig Tracey

Publisher Resources

ISBN: 9787115475619