11Statistics for Data Scientists

Data Scientists need to study statistics as well as computer science to be effective in their analytical journey.

11.1 Types of Variables

  1. QUANTITATIVE VARIABLES: Data that consists of counts or measurements. You can perform arithmetic operations on them. There are two types:
    1. Discrete Variable has a countable number of values within a specified range. For example, 14, 15, 17.
    2. Continuous Variable has an infinite number of values without any break or jump. For example, 14–16 (this includes 14.001 and also 14.0000001).
  2. CATEGORICAL VARIABLES: Variables that denote groupings or labels. Arithmetic operations cannot be performed on them.
    1. Nominal Variable has no ordering within its observed levels, groups, or categories. For example, gender (male and female cannot be ordered).
    2. Ordinal Variable has meaningful ordering within its levels. For example, Disease condition divided into categories of low, moderate, or severe.
Tree diagram displaying “Variable” branching to “Numeric” and “Categorical,” with “Numeric” branching to “Continuous” and “Discrete” and “Categorical” branching to “Ordinal” and “Nominal.”

Figure 11.1 Variable types.

11.2 Statistical Methods for Data Analysis

Two main statistical methods are used in data analysis: descriptive statistics, which summarize data from a sample using indexes such as the mean or standard deviation, and inferential statistics, which draw conclusions from data that are subject to random variation (e.g. observational errors, sampling variation)

Descriptive statistics Inferential statistics ...

Get SAS for R Users now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.