Skip to Content
Machine Learning Pocket Reference
book

Machine Learning Pocket Reference

by Matt Harrison
August 2019
Intermediate to advanced
318 pages
4h 40m
English
O'Reilly Media, Inc.
Book available
Content preview from Machine Learning Pocket Reference

Chapter 6. Exploring

It has been said that it is easier to take a SME and train them in data science than the reverse. I’m not sure I agree with that 100%, but there is truth that data has nuance and an SME can help tease that apart. By understanding the business and the data, they are able to create better models and have a better impact on their business.

Before I create a model, I will do some exploratory data analysis. This gives me a feel for the data, but also is a great excuse to meet and discuss issues with business units that control that data.

Data Size

Again, we are using the Titanic dataset here. The pandas .shape property will return a tuple of the number of rows and columns:

>>> X.shape
(1309, 13)

We can see that this dataset has 1,309 rows and 13 columns.

Summary Stats

We can use pandas to get summary statistics for our data. The .describe method will also give us the count of non-NaN values. Let’s look at the results for the first and last columns:

>>> X.describe().iloc[:, [0, -1]]
            pclass   embarked_S
count  1309.000000  1309.000000
mean     -0.012831     0.698243
std       0.995822     0.459196
min      -1.551881     0.000000
25%      -0.363317     0.000000
50%       0.825248     1.000000
75%       0.825248     1.000000
max       0.825248     1.000000

The count row tells us that both of these columns are filled in. There are no missing values. We also have the mean, standard deviation, minimum, maximum, and quartile values.

Note

A pandas DataFrame has an iloc attribute that we can do index operations on. It will let us pick ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Practical Simulations for Machine Learning

Practical Simulations for Machine Learning

Paris Buttfield-Addison, Mars Buttfield-Addison, Tim Nugent, Jon Manning

Publisher Resources

ISBN: 9781492047537Errata PageSupplemental Content