Chapter 12. Data Manipulation and Visualization in Python
In Chapter 8 you learned how to manipulate and visualize data, with heavy help from the tidyverse
suite of packages. Here we’ll demonstrate similar techniques on the same star dataset, this time in Python. In particular, we’ll use pandas
and seaborn
to manipulate and visualize data, respectively. This isn’t a comprehensive guide to what these modules, or Python, can do with data analysis. Instead, it’s enough to get you exploring on your own.
As much as possible, I’ll mirror the steps and perform the same operations that we did in Chapter 8. Because of this familiarity, I’ll focus less on the whys of manipulating and visualizing data than I will on hows of doing it in Python. Let’s load the necessary modules and get started with star. The third module, matplotlib
, is new for you and will be used to complement our work in seaborn
. It comes installed with Anaconda. Specifically, we’ll be using the pyplot
submodule, aliasing it as plt
.
In
[
1
]:
import
pandas
as
pd
import
seaborn
as
sns
import
matplotlib.pyplot
as
plt
star
=
pd
.
read_excel
(
'datasets/star/star.xlsx'
)
star
.
head
()
Out
[
1
]:
tmathssk
treadssk
classk
totexpk
sex
freelunk
race
\0
473
447
small
.
class
7
girl
no
white
1
536
450
small
.
class
21
girl
no
black
2
463
439
regular
.
with
.
aide
0
boy
yes
black
3
559
448
regular
16
boy
no
white
4
489
447
small
.
class
5
boy
yes
white
schidkn
0
63
1
20
2
19
3
69
4
79
Column-Wise Operations
In Chapter 11 you learned that pandas
will ...
Get Advancing into Analytics now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.