Given a variable, outliers are values that are very distant from other values of that variable. Outliers are quite common, and often caused by human or measurement errors. Outliers can strongly derail a model.
To demonstrate, let's look at two simple datasets and see how their mean is influenced by the presence of an outlier.
Consider the two datasets with few samples each: A = [1,2,3,4] and B = [1,2,3,4, 100]. The 5th value in the B dataset, 100, is obviously an outlier: mean(A) = 2.5, while mean(B) = 22. An outlier can have a large impact on a metric. Since most machine learning algorithms are based on distance or variance measurements, outliers can have a high impact on the performance of a model.
Multiple linear regression ...