Skip to Main Content
Think Stats, 2nd Edition
book

Think Stats, 2nd Edition

by Allen B. Downey
October 2014
Beginner content levelBeginner
226 pages
5h 42m
English
O'Reilly Media, Inc.
Content preview from Think Stats, 2nd Edition

Chapter 7. Relationships Between Variables

So far we have only looked at one variable at a time. In this chapter we look at relationships between variables. Two variables are related if knowing one gives you information about the other. For example, height and weight are related; people who are taller tend to be heavier. Of course, it is not a perfect relationship: there are short heavy people and tall light ones. But if you are trying to guess someone’s weight, you will be more accurate if you know their height than if you don’t.

The code for this chapter is in scatter.py. For information about downloading and working with this code, see Using the Code.

Scatter Plots

The simplest way to check for a relationship between two variables is a scatter plot, but making a good scatter plot is not always easy. As an example, I’ll plot weight versus height for the respondents in the BRFSS (see The lognormal Distribution).

Here’s the code that reads the data file and extracts height and weight:

    df = brfss.ReadBrfss(nrows=None)
    sample = thinkstats2.SampleRows(df, 5000)
    heights, weights = sample.htm3, sample.wtkg2

SampleRows chooses a random subset of the data:

def SampleRows(df, nrows, replace=False):
    indices = np.random.choice(df.index, nrows, replace=replace)
    sample = df.loc[indices]
    return sample

df is the DataFrame, nrows is the number of rows to choose, and replace is a boolean indicating whether sampling should be done with replacement; in other words, whether the same row could be chosen more ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Think Bayes, 2nd Edition

Think Bayes, 2nd Edition

Allen B. Downey
Practical Tableau

Practical Tableau

Ryan Sleeper

Publisher Resources

ISBN: 9781491907344Errata