Skip to Content
《Python 数据分析》第三版
book

《Python 数据分析》第三版

by Wes McKinney
May 2025
Intermediate to advanced
582 pages
8h 8m
Chinese
O'Reilly Media, Inc.
Content preview from 《Python 数据分析》第三版

附录 A. 高级 NumPy

在本附录中,我将深入介绍用于数组计算的 NumPy 库。这将包括有关ndarray 类型的更多内部细节,以及更高级的数组操作和算法。

本附录包含一些杂项内容,不一定要线性阅读。在整个章节中,我将为许多使用numpy.random 模块中默认随机数生成器的示例生成随机数据:

In [11]: rng = np.random.default_rng(seed=12345)

A.1 ndarray 对象内部结构

NumPy ndarray 提供了一种将同类型数据块 (连续或串行)解释为多维数组对象的方法。数据类型(或dtype)决定了如何将数据解释为浮点、整数、布尔或我们已经了解过的其他类型。

使 ndarray 变得灵活的部分原因是,每个数组对象都是一个数据块上的strided视图。举例来说,你可能会问,为什么数组视图arr[::2, ::-1] 不会复制任何数据。原因在于,ndarray 不仅仅是一大块内存和一种数据类型;它还具有striding信息,使数组能够以不同的步长在内存中移动。更确切地说,ndarray 的内部结构如下:

  • 数据指针,即 RAM 或内存映射文件中的数据块

  • 描述数组中固定大小数值单元的数据类型或 dtype

  • 表示数组形状的元组

  • 步长的整数元组,表示 "步进 "一个元素所需的字节数

有关 ndarray 内部结构的简单模型,请参见图 A-1。

图 A-1. NumPy ndarray 对象

例如,10 × 5 阵列的形状是(10, 5)

In [12]: np.ones((10, 5)).shape
Out[12]: (10, 5)

一个典型的(C 阶)3 × 4 × 5float64 (8 字节)值数组的跨距为(160, 40, 8) (了解跨距可能很有用,因为一般来说,特定轴上的跨距越大,沿该轴进行计算的成本就越高):

In [13]: np.ones((3, 4, 5), dtype=np.float64).strides
Out[13]: (160, 40, 8)

虽然典型的 NumPy 用户很少会对数组跨距感兴趣,但在构建 "零拷贝 "数组视图时却需要它们。Strides 甚至可以是负数,这使得数组可以在内存中 "向后 "移动(例如,在obj[::-1]obj[:, ::-1] 这样的切片中就是这种情况)。

NumPy 数据类型层次结构

有时,您的代码可能需要检查数组是否包含 整数、浮点数、字符串或 Python 对象。由于浮点数有多种类型 (float16float128),因此检查数据类型是否在类型列表中会非常繁琐。幸运的是,这些数据类型都有超类,如np.integernp.floating ,可以与np.issubdtype函数一起使用:

In [14]: ints = np.ones(10, dtype=np.uint16)

In [15]: floats = np.ones(10, dtype=np.float32)

In [16]: np.issubdtype(ints.dtype, np.integer)
Out[16]: True

In [17]: np.issubdtype(floats.dtype, np.floating)
Out[17]: True

通过调用特定数据类型的mro方法,可以查看该类型的所有父类: ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

Python 机器学习入门

Python 机器学习入门

Andreas C. Müller, Sarah Guido
用于 DevOps 的 Python

用于 DevOps 的 Python

Noah Gift, Kennedy Behrman, Alfredo Deza, Grig Gheorghiu

Publisher Resources

ISBN: 9798341656734