Statistical Literacy: Linear Models as a Unifying Concept (using R)
Uncovering the foundational concepts that link inferential statistics to deep learning
Topic: Data
In this course, the big idea is to understand linear models as an essential underlying concept for many statistical and machine learning techniques. They provide a unifying framework that unites basic equations like the mean and variance all the way up to modern, complex processes like deep learning.
We’ll cover the fundamentals of linear models and reveal major themes in data analysis using insightful connections and examples.
As in “Statistical Literacy: Inferential Statistics using R”, this course also focuses on developing a deeper understanding of the key concepts that unite what seems, to newcomers, as disparate techniques. This reveals the underlying concepts and lays the foundation for further study.
What you'll learnand how you can apply it
By the end of this live, handson, online course, you’ll understand:
 What linear models are
 The mean and twosample ttests as linear models
 Models as bestguess predictions
 The curse of dimensionality
 The minimization of loss functions (residuals)
 Similarities among equations for various situations
 Biasvariance tradeoff
 Complex methods as elaborations of concepts present in simple linear models
And you’ll be able to:
 Understand reported results based on linear models
 Have a solid basis for further independent study
This training course is for you because...
 You encounter linear models but are unclear of what they mean.
 You don’t understand that the mean & variance, ttests, Ordinary Least Squares regression and ANOVA are literally built on the same fundamental concepts.
 You don’t see how more complex or reiterative methods like clustering, gradient descent and deep learning are also connected to linear models.
 You apply linear models, but are unclear how to interpret the results.
Prerequisites
 Basictointermediate knowledge of R and RStudio
 An understanding of fundamental concepts in statistics that are useful but not explicitly covered in this course:
 Simple random samples
 Systematic vs random error & types of selection bias
 Measures for location and spread
Recommended preparation:
 Datasets used will be builtin datasets available in R or provided via a GitHub repository for use after the class.
 An RStudio account is needed for the incourse exercises. RStudio Cloud projects, preloaded with exercise scripts and datasets, will be provided shortly before the course.
About your instructor

Rick Scavetta has worked as an independent data science trainer since 2012. Operating as Scavetta Academy, Rick has a close and recurring presence at primary research institutes all over Germany, including many Max Planck Institutes and Excellence Clusters, in fields as varied as primatology, earth sciences, marine biology, molecular genetics, and behavioral psychology.
Schedule
The timeframes are only estimates and may vary according to how the class is progressing
Introduction (20 minutes)
 Discussion: What are models? Linear models? Where do they appear?
 Lecture: Overview of methods explored in this course
 Q&A
Classic OLS regression (60 minutes)
 Lecture: Defining models, biasvariance tradeoff, minimizing loss functions
 Demonstration: The basics of linear models in R
 Handson Exercise: Coding OLS regression from scratch
 Q&A
 5 minute break
Other statistical tests (30 minutes)
 Lecture: Understanding twosample ttests and ANOVA, the curse of dimensionality
 Discussion: Similarities to regression
 Demonstration: Executing ttest and ANOVA as linear models
 Handson Exercise: Performing tests in R
 Q&A
Extending linear models (30 minutes)
 Lecture: Elaborating on simple models for regression and ANOVA
 Handson Exercises: Exploring model forms in R
 Q&A
 5 minute break
Session 4  Complex Methods (ca 30 minutes)
 Lecture: Analytical versus reiterative approaches to minimize the loss function.
 Discussion: Linear models as the basis for advanced methods.
 Exercise: Executing advanced methods in R
 Wrapup and Q&A