The purpose of this chapter is threefold: (i) to review many basic notions from simple regression (the linear regression model, ordinary least squares [OLS], and the central limit theorem—in this context, basic inference); (ii) to introduce some more advanced features of R (matrix commands, curve fitting, plotting, and “inquiry” functions); and (iii) to introduce the idea of simulating data.
Real data is very important in statistics, but so is simulated data. Simulated data has known characteristics, allowing the student/programmer to examine the performance of algorithms, plots, and formulas in the best- and worst-case scenarios. Simulating data based on formulas and models allows the student/programmer to operationalize the formulas and models, often leading to a more complete understanding of what the formula or model is “saying.” The ability to simulate data allows the student/programmer to quickly check conjectures and produce useful examples and counter examples. It is the opinion of the author that the ability to effortlessly and routinely simulate data is a skill all statisticians should have.
Imagine data produced by the following simple model: yk = β0 + β1xk + ϵk.
The errors, ϵk, are normal, have mean zero, have equal spread, and are independent.
Note: A key difference between a traditional statistical problems and a time series problem ...