Normalizing and denormalizing data

Database normalization is the process whereby a database schema is designed to reduce data duplication and redundancy. If a database is not designed with normalization principles in mind, it can:

  • Get overly large, due to duplicated data
  • Make data maintenance difficult or give rise to data integrity issues if the same data values reside in multiple tables

While we are not directly concerned with database schema design in this chapter, our next two examples look at processing operations borne from the same principles as database normalization, so readers who aren't familiar with the concepts may wish to read some introductory material first. For a good primer on database normalization, go to http://en.wikipedia.org/wiki/Database_normalization ...

Get Getting Started with Talend Open Studio for Data Integration now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.