Preparing the data for aggregation

Before summarizing the data at the purchase level, there is some  preparatory work to be done. Rows missing the invoice number need to be excluded and it makes sense to also remove those that are missing the CustomerID since this is the field needed to combine invoices for the same customer and invoices missing this information cannot be linked together. It will also be useful to calculate the total cost for each row by multiplying the unit price by the quantity. The following SPSS syntax was used to select only the non-missing rows for the CustomerID and InvoiceNo fields. It also creates the total cost field (itemcost) and requests an updated set of descriptive statistics:

SELECT IF (not(missing(CustomerID)) ...

Get Data Analysis with IBM SPSS Statistics now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.