We'll generate a synthetic dataset containing two independent variables. We will then contaminate it and see how both lm and rlm perform:
- First, we generate a synthetic dataset, containing two independent Gaussian variables. y is equal to x1 + x2 plus a Gaussian residual:
set.seed(10) x1 = rnorm(100,0,2) x2 = rnorm(100,0,2) y = x1 + x2 + rnorm(100,0,1)
- Now, we introduce an extreme value to x1. It will certainly disrupt our estimated coefficients:
y[100] = 100 plot(x1,y)
The following screenshot shows the plot between x1 and the dependent variable: y (notice the outlier on the upper-right part of the plot):
- The coefficients ...