Normalizing data

Some datasets are nice to see but complicate to process further. Take a look at the matches file we saw in Chapter 3:

Match Date;Home Team;Away Team;Result
02/06;Italy;France;2-1
02/06;Argentina;Hungary;2-1
06/06;Italy;Hungary;3-1
06/06;Argentina;France;2-1
10/06;France;Hungary;3-1
10/06;Italy;Argentina;1-0
...

Imagine you want to answer these questions:

  1. How many teams played?
  2. Which team converted most goals?
  3. Which team won all matches it played?

The dataset is not prepared to answer those questions, at least in an easy way. If you want to answer those questions in a simple way, you will first have to normalize the data, that is, convert it to a suitable format before proceeding. Let's work on it.

Get Pentaho 3.2 Data Integration Beginner's Guide now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.