Chapter 4. Data Normalization

Normalization produces highly cohesive and loosely coupled data schemas. Denormalization improves performance. Make the trade-offs wisely.

Data normalization is a process in which data attributes within a data model are organized to increase the cohesion of entity types and to reduce the coupling between entity types. The goal of data normalization is to reduce, or even eliminate, data redundancy. This is an important consideration for application developers because it is incredibly difficult to store objects in a relational database if a data attribute is stored in several places.

To explore the techniques of data normalization, this chapter addresses the following topics:

  • The first three normal forms

  • Why data normalization?

  • The role of the agile DBA

  • First normal form (1NF)

  • Second normal form (2NF)

  • Third normal form (3NF)

  • Beyond 3NF

Why Data Normalization?

The advantage of having a highly normalized data schema is that information is stored in one place and one place only, reducing the possibility of inconsistent data. Furthermore, highly normalized data schemas in general are closer conceptually to object-oriented schemas because the object-oriented goals of promoting high cohesion and loose coupling between classes results in similar solutions (at least from a data point of view). This generally makes it easier to map your objects to your data schema.

Unfortunately, normalization usually comes at a performance cost. With the data schema of Figure 4.1 all the ...

Get Agile Database Techniques: Effective Strategies for the Agile Software Developer now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.