Any task that involves making decisions from data almost always falls into one of the following categories:

  • Summarizing the data: Summarization is a process in which the data is reduced for interpretation without sacrificing any important information. Summaries can be developed for the data as a whole or any portion of the data. For example, a retail company that collected data on its transactions could develop summaries of the total sales transactions. In addition, the company could also generate summaries of transactions by products or stores.
  • Finding hidden relationships: This refers to the identification of important facts, relationships, anomalies or trends in the data, which are not obvious from a summary alone. To discover this information will involve looking at the data from many angles. For example, a retail company may want to understand customer profiles and other facts that lead to the purchase of certain product lines.
  • Making predictions: Prediction is the process where an estimate is calculated for something that is unknown. For example, a retail company may want to predict, using historical data, the sort of products that specific consumers may be interested in.

There is a great deal of interplay between these three tasks. For example, it is important to summarize the data before making predictions or finding hidden relationships. Understanding any hidden relationships between different items in the data can help in generating ...

Get Making Sense of Data: A Practical Guide to Exploratory Data Analysis and Data Mining now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.