Skip to Content
精通数据科学算法
book

精通数据科学算法

by Posts & Telecom Press, David Natingga
May 2024
Intermediate to advanced
181 pages
3h 9m
Chinese
Packt Publishing
Content preview from 精通数据科学算法

第1章 用k最近邻算法解决分类问题

最近邻算法可以基于某数据实例的邻居来判定该实例的类型。k最近邻算法从距离该实例最近的k个邻居中找出最具代表性的类型,并将其赋给该数据实例。

本章将介绍k-NN算法的基础知识,并通过一个简单的例子——Mary对温度的偏好来理解和实现k-NN算法。在意大利的示例地图上,您将学习如何选择正确的k值,以使算法正确执行并达到最高的准确率。您将从房屋偏好的例子中学习如何重新调整k-NN算法的数值参数。在文本分类的例子中,您将学习如何选择一个好的标准来衡量数据点之间的距离,以及如何消除高维空间中不相关的维度以保证算法的正确执行。

举个例子,如果Mary在10℃的时候感觉冷,但在25℃的时候感觉热,那么在22℃的房间里,最近邻算法猜测她会感到温暖,因为22℃比10℃更接近25℃。

前面的例子可以知道Mary什么时候感觉到热或冷,但当Mary被问及是否感到热或冷时,风速也是一个影响因素,如表1-1所示。

表1-1

温度(℃)

风速(km/h)

Mary的偏好

10

0

Cold

25

0

Warm

15

5

Cold

20

3

Warm

18

7

Cold

20

10

Cold

22

5

Warm

24

6

Warm

将该数据在图中表示,结果如图1-1所示。

C:\Users\LL\Desktop\49816\未命名-1-web-resources\image\1-1.jpg

图1-1

现在,假设用1-NN算法判断Mary处在温度为16℃、风速为3km/h情况下的感觉,如图1-2所示。 ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

数据科学原理

数据科学原理

Posts & Telecom Press, Sinan Ozdemir
PyTorch深度学习

PyTorch深度学习

Posts & Telecom Press, Vishnu Subramanian
程序员学数据结构

程序员学数据结构

Posts & Telecom Press, William Smith
可编程网络自动化

可编程网络自动化

Jason Edelman, Scott S. Lowe, Matt Oswalt

Publisher Resources

ISBN: 9781836204596