Data cleanup

Before we can start building clustering models, there are five tasks we need to do to clean up our data and prepare it for modeling. The clean-up steps are as follows:

  1. Dropping canceled orders: We are going to drop records with negative Quantity, using the following code:
        df <- df[which(df$Quantity > 0),]
  1. Dropping records with no CustomerID: There are 133,361 records with no CustomerID and we are going to drop those records with the following code:
        df <- na.omit(df)
  1. Excluding an incomplete month: As you might recall from previous chapters, the data in the month of December, 2011, is incomplete. You can exclude this data with the following code:
        df <- df[which(df$InvoiceDate < '2011-12-01'),]
  1. Computing total sales from ...

Get Hands-On Data Science for Marketing now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.