In order to optimize the parameters of our linear model, we'd like to devise a cost function, also called a **loss function**, that quantifies how closely our predictions fit the data. We cannot simply sum up the residuals, positive and negative, because even large residuals will cancel each other out if their signs are in opposite directions.

We could square the values before calculating the sum so that positive and negative residuals both count towards the cost. This also has the effect of penalizing large errors more than smaller errors, but not so much that the largest residual always dominates.

Expressed as an optimization problem, we seek to identify the coefficients that minimize the sum of the residual squares. This is ...

Start Free Trial

No credit card required