Chapter 14. Data Indexing and Selection
In Part II, we looked in detail
at methods and tools to access, set, and modify values in NumPy arrays.
These included indexing (e.g., arr[2, 1]
), slicing (e.g.,
arr[:, 1:5]
), masking (e.g., arr[arr > 0]
), fancy indexing (e.g.,
arr[0, [1, 5]]
), and combinations thereof (e.g., arr[:, [1, 5]]
).
Here we’ll look at similar means of accessing and modifying
values in Pandas Series
and DataFrame
objects. If you have used the
NumPy patterns, the corresponding patterns in Pandas will feel very
familiar, though there are a few quirks to be aware of.
We’ll start with the simple case of the one-dimensional
Series
object, and then move on to the more complicated
two-dimensional DataFrame
object.
Data Selection in Series
As you saw in the previous chapter, a Series
object acts in many ways
like a one-dimensional NumPy array, and in many ways like a standard
Python dictionary. If you keep these two overlapping analogies in mind,
it will help you understand the patterns of data indexing and selection
in these arrays.
Series as Dictionary
Like a dictionary, the Series
object provides a mapping from a
collection of keys to a collection of values:
In
[
1
]:
import
pandas
as
pd
data
=
pd
.
Series
([
0.25
,
0.5
,
0.75
,
1.0
],
index
=
[
'a'
,
'b'
,
'c'
,
'd'
])
data
Out
[
1
]:
a
0.25
b
0.50
c
0.75
d
1.00
dtype
:
float64
In
[
2
]:
data
[
'b'
]
Out
[
2
]:
0.5
We can also use dictionary-like Python expressions and methods to examine the keys/indices and values:
In
[
3
]:
'a'
in
data
Out
[
Get Python Data Science Handbook, 2nd Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.