Degrees of freedom
To complete our calculation of the variance we need the degrees of freedom (d.f.) This important concept in statistics is defined as follows:
which is the sample size, n, minus the number of parameters, k, estimated from the data. For the variance, we have estimated one parameter from the data, , and so there are n – 1 degrees of freedom. In a linear regression, we estimate two parameters from the data, the slope and the intercept, and so there are n – 2 degrees of freedom in a regression analysis.
Variance is denoted by the lower-case Latin letter s squared: s2. The square root of variance, s, is called the standard deviation. We always calculate variance as
Consider the following data, y:
y<-c(13,7,5,12,9,15,6,11,9,7,12)
We need to write a function to calculate the sample variance: we call it variance and define it like this:
variance<-function(x) sum((x – mean(x))∧2)/(length(x)-1)
and use it like this:
variance(y)
[1] 10.25455
Our measure of variability in these data, the variance, is thus 10.25455. It is said to be an unbiased estimator because we divide the sum of squares by the degrees of freedom (n −1) rather than by the sample size, n, to compensate ...
Get The R Book now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.