11Statistics for Data Scientists
Data Scientists need to study statistics as well as computer science to be effective in their analytical journey.
11.1 Types of Variables
- QUANTITATIVE VARIABLES: Data that consists of counts or measurements. You can perform arithmetic operations on them. There are two types:
- Discrete Variable has a countable number of values within a specified range. For example, 14, 15, 17.
- Continuous Variable has an infinite number of values without any break or jump. For example, 14–16 (this includes 14.001 and also 14.0000001).
- CATEGORICAL VARIABLES: Variables that denote groupings or labels. Arithmetic operations cannot be performed on them.
- Nominal Variable has no ordering within its observed levels, groups, or categories. For example, gender (male and female cannot be ordered).
- Ordinal Variable has meaningful ordering within its levels. For example, Disease condition divided into categories of low, moderate, or severe.
11.2 Statistical Methods for Data Analysis
Two main statistical methods are used in data analysis: descriptive statistics, which summarize data from a sample using indexes such as the mean or standard deviation, and inferential statistics, which draw conclusions from data that are subject to random variation (e.g. observational errors, sampling variation)
Descriptive statistics | Inferential statistics ... |
Get SAS for R Users now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.