APPENDIX FA DATA MINING EXAMPLE

INTRODUCTION

This example uses just one statistical technique, the a-priori algorithm. This algorithm is used to find association rules in data. It uses data that appears more than a certain percentage of the time, the ‘support threshold’.

THE SCENARIO

A supermarket chain wishes to determine whether customers opt for either ‘own-label’ products or branded products.

Raw data is available for each customer’s purchases, recording the quantities of each product bought during each supermarket visit. The data from 500 such visits will be investigated.

The support threshold is 15 per cent.

Step 1

The raw data is scanned to determine the frequency of each product category bought during a visit. The results satisfying ...

Get Principles of Data Management - Facilitating information sharing Second edition now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.