Before we can start building clustering models, there are five tasks we need to do to clean up our data and prepare it for modeling. The clean-up steps are as follows:
- Dropping canceled orders: We are going to drop records with negative Quantity, using the following code:
df = df.loc[df['Quantity'] > 0]
- Dropping records with no CustomerID: There are 133,361 records with no CustomerID and we are going to drop those records with the following code:
df = df[pd.notnull(df['CustomerID'])]
- Excluding an incomplete month: As you might recall from previous chapters, the data in the month of December, 2011, is incomplete. You can exclude this data with the following code:
df = df.loc[df['InvoiceDate'] < '2011-12-01']
- Computing ...