Before we can start building clustering models, there are five tasks we need to do to clean up our data and prepare it for modeling. The clean-up steps are as follows:
- Dropping canceled orders: We are going to drop records with negative Quantity, using the following code:
df <- df[which(df$Quantity > 0),]
- Dropping records with no CustomerID: There are 133,361 records with no CustomerID and we are going to drop those records with the following code:
df <- na.omit(df)
- Excluding an incomplete month: As you might recall from previous chapters, the data in the month of December, 2011, is incomplete. You can exclude this data with the following code:
df <- df[which(df$InvoiceDate < '2011-12-01'),]
- Computing total sales from ...