Data cleanup

Before we can start building clustering models, there are five tasks we need to do to clean up our data and prepare it for modeling. The clean-up steps are as follows:

  1. Dropping canceled orders: We are going to drop records with negative Quantity, using the following code:
        df = df.loc[df['Quantity'] > 0]
  1. Dropping records with no CustomerID: There are 133,361 records with no CustomerID and we are going to drop those records with the following code:
        df = df[pd.notnull(df['CustomerID'])]
  1. Excluding an incomplete month: As you might recall from previous chapters, the data in the month of December, 2011, is incomplete. You can exclude this data with the following code:
        df = df.loc[df['InvoiceDate'] < '2011-12-01']
  1. Computing ...

Get Hands-On Data Science for Marketing now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.