In this recipe, we will work with a dataset containing house prices. The intention will be to identify influential observations:
- We first load our dataset, and formulate our model:
library(car) data = read.csv("./house_prices_aug.csv") model = lm(Property_price ~ size + number.bathrooms + number.bedrooms +number.entrances +size_balcony +size_entrance,data=data)
- We can build a simple plot to identify influential observations. The X axis represents the leverage, while the Y axis represents the residual size. A quick rule is to flag observations as influential if Cook's D is greater than 1. R creates two curves, one for Cook's D = 0.5 and another one for Cook's D = 1. Since we usually focus on observations with Cook's D > ...