How to do it...

In this recipe, we will work with a dataset containing house prices. The intention will be to identify influential observations:

  1. We first load our dataset, and formulate our model:
library(car) data = read.csv("./house_prices_aug.csv") model = lm(Property_price ~ size + number.bathrooms + number.bedrooms +number.entrances +size_balcony  +size_entrance,data=data) 
  1. We can build a simple plot to identify influential observations. The X axis represents the leverage, while the Y axis represents the residual size. A quick rule is to flag observations as influential if Cook's D is greater than 1. R creates two curves, one for Cook's D = 0.5 and another one for Cook's D = 1. Since we usually focus on observations with Cook's D > ...

Get R Statistics Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.