Data Design

Tables and columns present a logical view of the data in a relational database. The flexibility of a relational database gives you many options for grouping the atomic pieces of data, represented by the columns, into a set of tables. To use Oracle effectively, you must understand and follow some firmly established principles of database design.

The topic of database design is vast and deep: we won’t even pretend to offer more than a cursory overview. For more information, we recommend the book Oracle Design by Dave Ensor and Ian Stevenson (see Appendix B for details).

When E. F. Codd created the concept of a relational database in the 1960s, he also began work on the concept of normalized data design. The theory behind normalized data design is pretty straightforward: a table should contain only the information that’s directly related to the key value of the table. The process of assembling these logical units of information is called normalization of the database design.

Normalized Forms

In fact, there is more than one type of normalization. Each step in the normalization process ends with a specific result called a normalized form . There are five standard normalized forms, which are referred to as first normal form (1NF), second normal form (2NF), and so on. The normalization process that we describe briefly in this section results in third normal form (3NF), the most common type of normalization.

Explaining the complete concepts that lie behind the different normal forms is beyond the scope of this chapter and book.

The concept of normalized table design was tailored to the capabilities of the relational database. Because you could join data from different tables together in a query, there was no need to keep all the information associated with a particular object together in a single record. You could decompose the information into associated units and simply join the appropriate units together when you needed information that crossed table boundaries.

There are many different methodologies for normalizing data. Here’s one example. You start by defining all the data required by your application:

Identify the objects your application needs to know (the entities ). Examples of entities, as shown in Figure 4-3, include employees, locations, and jobs.
Identify the individual pieces of data, referred to by data modelers as attributes , for these entities. In Figure 4-3, employee name and salary are attributes. Typically, entities correspond to tables and attributes correspond to columns.
As a potential last step in the process, identify relationships between the entities based on your business. These relationships are implemented in the database schema through the use of structures such as foreign keys. For example, the primary key of the DEPARTMENT NUMBER table would be a foreign key column in the EMPLOYEE NAME table used to identify the DEPARTMENT NUMBER in which an employee works.

Normalization is good because it avoids redundant data. Storing the department in every employee record not only would waste space but also would lead to an unmanageable data maintenance issue. If the department name changed, you would have to update every employee record. By normalizing the department data into a table and simply pointing to the appropriate row from the employee rows, you avoid duplication of data and the associated problems.

However, there is an even more important reason to go through the process of designing a normalized database. You can benefit from normalization because of the planning process that normalizing a data design entails. By really thinking about the way the intended applications use data, you get a much clearer picture of the needs the system is designed to serve. This understanding leads to a much more focused database and application.

Normalization also reduces the amount of data that any one row in a table contains. The less data in a row, the less I/O is needed to retrieve it, which helps to avoid this performance bottleneck. In addition, the smaller the data in a row, the more rows are retrieved per data block, which increases the likelihood that more than one desired row will be retrieved in an I/O operation. And the smaller the row, the more rows will be kept in Oracle’s system buffers, which also increases the likelihood that a row will be available in memory when it’s needed, thereby avoiding the need for any disk I/O at all.

Finally, the process of normalization includes the creation of foreign key relationships and other data constraints. These relationships add a level of data integrity to your database design. We’ll describe these relationships in more detail in Section 4.5.

Figure 4-3 shows a simple list of attributes grouped into entities and linked by a foreign key relationship.

Figure 4-3. The normalization process

Creating a normalized data design isn’t the only data design work you will have to do. Once you’ve completed an optimal logical database design, you must go back and consider what indexes you should add to improve the anticipated performance of the database and whether you should designate any tables as part of a cluster or hash cluster.

Since adding these types of performance-enhancing data structures doesn’t affect the logical representation of the database, you can always make these types of modifications later when you see the way an application uses the database in test mode or in production.

Should You Normalize Your Data?

Whenever possible, we recommend that you go through the process of designing a normalized structure for your database.

Data normalization has been proven, both theoretically and in decades of practice, to provide concrete benefits. In addition, the process of creating a normalized data design is intimately intertwined with the process of understanding the data requirements for your application system. You can improve even the simplest database by the discoveries made during the process of normalization.

However, there may be times when you feel that the benefits of a fully normalized design will counteract the performance penalty that a design imposes on your production systems. For example, you may have one, two, or three contact names to be placed in their own table, with a foreign key linking back to the main row for the organization. But because you want to see all the contact names every time you request contact information, you might decide to save the overhead of the join and simply include the three contact names in your organization table. This technique is common in decision-support/data warehousing applications.

Of course, this violation of the rules of normalization limits the flexibility of your application systems—for example, if you later decide that you need four contact names, some redesign will be necessary.

For this reason, we suggest that you always implement a fully normalized database design and then, if necessary, go back and denormalize certain tables as needed. With this approach, you will at least have to make a conscious decision to “break” the normalization, which involves an active consideration of the price of denormalization.

Get Oracle Essentials: Oracle9i, Oracle8i and Oracle8, Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Oracle Essentials: Oracle9i, Oracle8i and Oracle8, Second Edition by Rick Greenwald, Robert Stackowiak, Jonathan Stern

Data Design

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly