BUY THIS BOOK
Add to Cart

Print Book $34.95


Add to Cart

PDF $27.99

Safari Books Online

What is this?

Add to UK Cart

Print Book £24.95

What is this?

Looking to Reprint or License this content?


Access Database Design & Programming
Access Database Design & Programming, Third Edition

By Steven Roman, Ph.D.
Book Price: $34.95 USD
£24.95 GBP
PDF Price: $27.99

Cover | Table of Contents | Colophon


Table of Contents

Chapter 1: Introduction
As mentioned in the Preface, one purpose of this book is to explain the basic concepts of modern relational-database theory and show how these concepts are realized in Microsoft Access. Allow me to amplify on this rather lofty goal.
To take a very simple view, which will do nicely for the purposes of this introductory discussion, a database is just a collection of related data. A database management system , or DBMS, is a system that is designed for two main purposes:
  • To add, delete, and update the data in the database
  • To provide various ways to view (on screen or in print) the data in the database
If the data is simple and there is not very much of it, then a database can consist of a single table. In fact, a simple database can easily be maintained even with a word processor!
To illustrate, suppose you want to set up a database for the books in a library. Purely for the sake of illustration, suppose the library contains 14 books. The same discussion would apply to a library of perhaps a few hundred books. Table 1-1 shows the LIBRARY_FLAT database in the form of a single table.
Table 1-1: The LIBRARY_FLAT sample database
ISBN
Title
AuID
AuName
AuPhone
PubID
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Database Design
As mentioned in the Preface, one purpose of this book is to explain the basic concepts of modern relational-database theory and show how these concepts are realized in Microsoft Access. Allow me to amplify on this rather lofty goal.
To take a very simple view, which will do nicely for the purposes of this introductory discussion, a database is just a collection of related data. A database management system , or DBMS, is a system that is designed for two main purposes:
  • To add, delete, and update the data in the database
  • To provide various ways to view (on screen or in print) the data in the database
If the data is simple and there is not very much of it, then a database can consist of a single table. In fact, a simple database can easily be maintained even with a word processor!
To illustrate, suppose you want to set up a database for the books in a library. Purely for the sake of illustration, suppose the library contains 14 books. The same discussion would apply to a library of perhaps a few hundred books. Table 1-1 shows the LIBRARY_FLAT database in the form of a single table.
Table 1-1: The LIBRARY_FLAT sample database
ISBN
Title
AuID
AuName
AuPhone
PubID
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Database Programming
The motivation for learning database programming is quite simple—power. If you want to have as much control over your databases as possible, you will need to do some programming. In fact, even some simple things require programming. For instance, there is no way to retrieve the list of fields of a given table using the Access graphical interface—you can only get this list through programming. (You can view such a list in the table-design mode of the table, but you cannot get access to this list in order to, for example, present the end-user with the list and ask if she wishes to make any changes to it.)
In addition, programming may be the only way to access and manipulate a database from within another application. For instance, if you are working in Microsoft Excel, you can create and manipulate an Access database with as much power as with Access itself, but only through programming! The reason is that Excel does not have the capability to render graphical representations of database objects. Instead you can create the database within Access and then manipulate it programmatically from within Excel.
It is also worth mentioning that programming can give you a great sense of satisfaction. There is nothing more pleasing than watching a program that you have written step through the rows of a table and make certain changes that you have requested. It is often easier to write a program to perform an action such as this than to remember how to perform the same action using the graphical interface. In short, programming is not only empowering, but it also sometimes provides the simplest route to a particular end.
And let us not forget that programming can be just plain fun!
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 2: The Entity-Relationship Model of a Database
Let us begin our discussion of database design by looking at an informal database model called the entity-relationship model . This model of a relational database provides a useful perspective, especially for the purposes of the initial database design.
I will illustrate the general principles of this model with the LIBRARY database example, which I will carry through the entire book. This example database is designed to hold data about the books in a certain library. The amount of data we will use will be kept artificially small—just enough to illustrate the concepts. (In fact, at this point, you may want to take a look at the example database. For details on downloading it from the Internet, or on using Microsoft Access to create it yourself, see Appendix D .) In the next chapter, we will actually implement the entity-relationship (E/R) model for our LIBRARY database.
A database may be defined as a collection of persistent data. The term persistent is somewhat vague, but is intended to imply that the data has a more-or-less independent existence or that it is semipermanent. For instance, data stored on paper in a filing cabinet, or stored magnetically on a hard disk, CD-ROM, or computer tape is persistent, whereas data stored in a computer's memory is generally not considered to be persistent. (The term permanent is a bit too strong, since very little in life is truly permanent.)
Of course, this is a very general concept. Most real-life databases consist of data that exist for a specific purpose and are thus persistent.
The purpose of a database is to store information about certain types of objects. In database language, these objects are called entities . For example, the entities of the LIBRARY database include books, authors, and publishers.
It is very important at the outset to make a distinction between the entities that are contained in a database at a given time and the world of all possible entities that the database might contain. The reason this is important is that the contents of a database are constantly changing and we must make decisions based not just on what is contained in a database at a given time, but on what might be contained in the database in the future.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
What Is a Database?
A database may be defined as a collection of persistent data. The term persistent is somewhat vague, but is intended to imply that the data has a more-or-less independent existence or that it is semipermanent. For instance, data stored on paper in a filing cabinet, or stored magnetically on a hard disk, CD-ROM, or computer tape is persistent, whereas data stored in a computer's memory is generally not considered to be persistent. (The term permanent is a bit too strong, since very little in life is truly permanent.)
Of course, this is a very general concept. Most real-life databases consist of data that exist for a specific purpose and are thus persistent.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Entities and Their Attributes
The purpose of a database is to store information about certain types of objects. In database language, these objects are called entities . For example, the entities of the LIBRARY database include books, authors, and publishers.
It is very important at the outset to make a distinction between the entities that are contained in a database at a given time and the world of all possible entities that the database might contain. The reason this is important is that the contents of a database are constantly changing and we must make decisions based not just on what is contained in a database at a given time, but on what might be contained in the database in the future.
For example, at a given time, our LIBRARY database might contain 14 book entities. However, as time goes on, new books may be added to the database, and old books may be removed. Thus, the entities in the database are constantly changing. If, for example, based on the fact that the 14 books currently in the database have different titles, we decide to use the title to identify each book uniquely, we may be in for some trouble when, later on, a different book arrives at the library with the same title as a previous book.
The world of all possible entities of a specific type that a database might contain is referred to as an entity class . We will use italics to denote entity classes. Thus, for instance, the world of all possible books is the Books entity class, and the world of all possible authors is the Authors entity class.
We emphasize that an entity class is just an abstract description of something, whereas an entity is a concrete example of that description. The entity classes in our very modest LIBRARY example database are (at least so far):
  • Books
  • Authors
  • Publishers
The set of entities of a given entity class that are in the database at a given time is called an
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Keys and Superkeys
A set of attributes that uniquely identifies any entity from among all possible entities in the entity class that may appear in the database is called a superkey for the entity class. Thus, the set {ISBN} is a superkey for the Books entity class, and the sets {PubID} and {PubName, PubPhone} are both superkeys for the Publishers entity class.
Note that there is a bit of subjectivity in this definition of superkey, since it depends ultimately on our decision about which entities may ever appear in the database, and this is probably something of which we cannot be absolutely certain. Consider, for instance, the Books entity class. There is no law that says all books must have an ISBN (and many books do not). Also, there is no law that says that two books cannot have the same ISBN. (The ISBN is assigned, at least in part, by the publisher of the book.) Thus, the set {ISBN} is a superkey only if we are willing to accept the fact that all books that the library purchases have distinct ISBNs or that the librarian will assign a uniqueersatz ISBN to any books that do not have a real ISBN.
It is important to emphasize that the concept of a superkey applies to entity classes, and not entity sets. Although we can define a superkey for an entity set, this is of limited use, since what may serve to identify the entities uniquely in a particular entity set may fail to do so if we add new entities to the set. To illustrate, the Title attribute does serve to identify each of the 14 books uniquely in the BOOKS table. Thus, {Title} is a superkey for the entity set described by the BOOKS table. However, {Title} is not a superkey for the Books entity class, since there are many distinct books with the same title.
We have remarked that {ISBN} is a superkey for the Books entity class. Of course, so is {Title, ISBN}, but it is wasteful and inefficient to include the Title attribute purely for the sake of identification.
Indeed, one of the difficulties with superkeys is that they may contain more attributes than is absolutely necessary to indentify any entity uniquely. It is more desirable to work with superkeys that do not have this property. A superkey is called a key when it has the property that no proper subset of it is also a superkey. Thus, if we remove an attribute from a key, the resulting set is no longer a superkey. Put more succinctly, a key is a minimal superkey. Sometimes keys are called
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Relationships Between Entities
If we are going to model a database as a collection of entity sets (tables), then we also need to describe the relationships between these entity sets. For instance, an author relationship exists between a book and the authors who wrote that book. We might call this relationship WrittenBy. Thus, Hamlet is WrittenBy Shakespeare.
It is possible to draw a diagram, called an entity-relationship diagram, or E/R diagram,to illustrate the entity classes in a database model, along with their attributes and relationships. Figure 2-1 shows the LIBRARY E/R diagram, with an additional entity class called Contributors (a contributor may be someone who contributes to or writes only a very small portion of a book, and thus may not be accorded all of the rights of an author, such as a royalty).
Figure 2-1: The LIBRARY entity-relationship diagram
Note that each entity class is denoted by a rectangle, and each attribute by an ellipse. The relations are denoted by diamonds. We have included the Contributors entity class in this model merely to illustrate a special type of relationship. In particular, since a contributor is considered an author, there is an IsA relationship between the two entity classes.
The model represented by an E/R diagram is sometimes referred to as a semantic model since it describes much of the meaning of the database.
Referring to Figure 2-1, the symbols 1 and ∞ represent the type of relationship between the corresponding entity classes. (The symbol ∞ is read "many.") Relationships can be classified into three types. For instance, the relationship between Books and Authors is many-to-many, meaning that a book may have many authors and an author may write many books. On the other hand, the relationship from
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 3: Implementing Entity-Relationship Models: Relational Databases
An E/R model of a database is an abstract model, visualized through an E/R dia-gram. For this to be useful, we must translate the abstract model into a concrete one. That is, we must describe each aspect of the model in the concrete terms that a database program can manipulate. In short, we must implement the E/R model. This requires implementing several things:
  • The entities
  • The entity classes
  • The entity sets
  • The relationships between the entity classes
The result of this implementation is a relational database.
As we will see, implementing the relationships usually involves some changes to the entity classes, perhaps by adding new attributes to existing entity classes or by adding new entity classes.
As discussed in the previous chapter, an entity is implemented (or described in concrete terms) simply by giving the values of its attributes. Thus, the following is an implementation of a Books entity:
Title = Gone With the Wind
ISBN = 0-12-345678-9
Price = $24.00
Since the entities in an entity class are implemented by giving their attribute values, it makes sense to implement an entity class by the set of attribute names. For instance, the Books entity class can be identified with the set:
{ISBN,Title,Price}
(We will add the PubID attribute name later, when we implement the relationships.)
Since attribute names are usually used as column headings for a table, a set of attribute names is called a table scheme. Thus, entity classes are implemented as table schemes. For convenience, we use notation such as:
Books(ISBN,Title,Price)
which shows not only the name of the entity class, but also the names of the attributes in the table scheme for this class. You can also think of a table scheme as the column headings row (the top row) of any table that is formed using that table scheme. (I will present an example of this shortly.)
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Implementing Entities
As discussed in the previous chapter, an entity is implemented (or described in concrete terms) simply by giving the values of its attributes. Thus, the following is an implementation of a Books entity:
Title = Gone With the Wind
ISBN = 0-12-345678-9
Price = $24.00
Since the entities in an entity class are implemented by giving their attribute values, it makes sense to implement an entity class by the set of attribute names. For instance, the Books entity class can be identified with the set:
{ISBN,Title,Price}
(We will add the PubID attribute name later, when we implement the relationships.)
Since attribute names are usually used as column headings for a table, a set of attribute names is called a table scheme. Thus, entity classes are implemented as table schemes. For convenience, we use notation such as:
Books(ISBN,Title,Price)
which shows not only the name of the entity class, but also the names of the attributes in the table scheme for this class. You can also think of a table scheme as the column headings row (the top row) of any table that is formed using that table scheme. (I will present an example of this shortly.)
We have defined the concepts of a superkey and a key for entity classes. These concepts apply equally well to table schemes, so we may say that the attributes {A,B} form a key for a table scheme, meaning that they form a key for the entity class implemented by that table scheme.
In a relational database, each entity set is modeled by a table. For example, consider the BOOKS table shown in Table 3-1, and note the following:
  • The first row of the table is the table scheme for the Books entity class.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
A Short Glossary
To help keep the various database terms clear, let us collect their definitions in one place:
Entity
An object about which the database is designed to store information. Example: a book; that is, an ISBN, a title, and a price, as in:
0-12-333433-3, On Liberty, $25.00
Attribute
A property that (partially or completely) describes an entity. Example: title.
Entity class
An abstract group of entities, with a common description. Example: the entity class Books, representing all books in the universe.
Entity set
The set of entities from a given entity class that are currently in the database. Example: the following set of 14 books:
0-12-333433-3, On Liberty, $25.00
0-103-45678-9, Iliad, $25.00
0-91-335678-7, Faerie Queene, $15.00
0-99-999999-9, Emma, $20.00
1-22-233700-0, Visual Basic, $25.00
1-1111-1111-1, C++, $29.95
0-91-045678-5, Hamlet, $20.00
0-555-55555-9, Macbeth, $12.00
0-99-777777-7, King Lear, $49.00
0-123-45678-0, Ulysses, $34.00
0-12-345678-9, Jane Eyre, $49.00
0-11-345678-9, Moby-Dick, $49.00
0-321-32132-1, Balloon, $34.00
0-55-123456-9, Main Street, $22.95
Superkey
A set of attributes for an entity class that serves to identify an entity uniquely from among all possible entities in that entity class. Example: the set {Title, ISBN} for the Books entity class.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Implementing the Relationships in a Relational Database
Now let us discuss how we might implement the relationships in an E/R database model. For convenience, we repeat the E/R diagram for the LIBRARY database in Figure 3-1.
Figure 3-1: The LIBRARY entity-relationship diagram
Implementing a one-to-many relationship, such as the PublisherOf relationship, is fairly easy. To illustrate, since {PubID} is a key for the Publishers entity class, we simply add this attribute to the Books entity class. Thus, the Books entity class becomes:
Books(ISBN,Title,PubID,Price)
The Books table scheme is now:
{ISBN,Title,PubID,Price}
and the BOOKS table now appears as shown in Table 3-2 (sorted by PubID).
Table 3-2: The BOOKS table sorted by PubID
ISBN
Title
PubID
Price
0-12-333433-3
On Liberty
1
$25.00
0-103-45678-9
Iliad
1
$25.00
0-91-335678-7
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
The LIBRARY Relational Database
We can now complete the implementation of the LIBRARY relational database (without the Contributors entity class) in Microsoft Access. If you open the LIBRARY database in Microsoft Access, you will see four tables:
  • AUTHORS
  • BOOK/AUTHOR
  • BOOKS
  • PUBLISHERS
(The LIBRARY_FLAT table is not used in the relational database.)
These four tables correspond to the following four entity classes (or table schemes):
  • Authors (AuID, AuName, AuPhone)
  • Book/Author (ISBN, AuID)
  • Books (ISBN, Title, PubID, Price)
  • Publishers (PubID, PubName, PubPhone)
The actual tables are shown in Tables Table 3-3 through Table 3-6.
Table 3-3: The AUTHORS table from the Access LIBRARY database
AuID
AuName
AuPhone
1
Austen
111-111-1111
10
Jones
123-333-3333
11
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Index Files
When a table is stored on disk, it is often referred to as a file. In this case, each row of the table is referred to as a record , and each column is referred to as a field. (These terms are often used for any table.)
Since disk access is typically slow, an important goal is to reduce the amount of disk accesses necessary to retrieve the desired data from a file. Sequential searching of the data, record-by-record, to find the desired information may require a large number of disk accesses and is very inefficient.
The purpose of an index file is to provide direct (also called random) access to data in a database file.
Figure 3-8 illustrates the concept of an index file. For illustration purposes, we have changed the Publishers data, to include a city column. The file on the left is the index file and indexes the Publishers datafile by the City field, which is therefore called theindexed field. The city file is called an index for the PUBLISHERS table. (The index file is not a table in the same sense as the PUBLISHERS table is a table. That is to say, we cannot directly access the index file—instead we use it indirectly.) The index file contains the cities for each publisher, along with a pointer to the corresponding data record in the Publishers file.
Figure 3-8: Index file between City and Publisher
An index file can be used in a variety of ways. For instance, to find all publishers located in Kansas City, Access can first search the alphabetical list of cities in the index file. Since the list is alphabetical, Access knows that the Kansas City entries are all together, and so once it reaches the first entry after Kansas City, it can stop fcthe search. In other words, Access does not need to search the entire index file. (In addition, there are very efficient search algorithms for ordered tables.) Once the Kansas City entries are found in the index file, the pointers can be used to go directly to the Kansas City publishers in the indexed file.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
NULL Values
The question of NULLs can be very confusing to the database user, so let us set down the basic principles. Generally speaking, a NULL is a special value that is used for two reasons:
  • To indicate that a value is missing or unknown
  • To indicate that a value is not applicable in the current context
For instance, consider an author's table:
AUTHORS(AuID,AuName,AuPhone)
If a particular author's phone number is unknown, it is appropriate for that value to be NULL. This is not to say that the author does not have a phone number, but simply that we have no information about the number—it may or may not exist. If we knew that the person had no phone number, then the information would no longer be unknown. In this case, the appropriate value of the AuPhone attribute would be the empty string, or perhaps the string no phone, but not a NULL. Thus, the appropriateness of allowing NULL values for an attribute depends upon the context.
The issue of whether NULLs should appear in a key needs some discussion. The purpose of a key is to provide a means for uniquely identifying entities, and so it would seem that keys and NULLs are incompatible. However, it is impractical to never allow NULLs in any keys. For instance, for the Publishers entity, this would mean not allowing a PubPhone to be NULL, since {PubName,PubPhone} is a key. On the other hand, the so-called entity integrity rule says that NULLs are not allowed in a primary key.
As a final remark, the presence of a NULL as a foreign key value does not violate referential integrity. That is, referential integrity requires that every non-NULL value in a foreign key must have a match in the referenced key.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 4: Database Design Principles
In Chapter 1 I tried to present a convincing case for why most databases should be modeled as relational databases, rather than single-table flat databases. I tried to make it clear why I split the single LIBRARY_FLAT table into four separate tables: AUTHORS, BOOKS, PUBLISHERS, and BOOK/AUTHOR.
However, for large real-life databases, it is not always clear how to split the data into multiple tables. As I mentioned in Chapter 1, the goal is to minimize redundancy, without losing any information.
The problem of effective database design is a complex one. Most people consider it an art rather than a science. This means that intuition plays a major role in good design. Nonetheless, there is a considerable theory of database design, and it can be quite complicated. My goal in this chapter is to touch upon the general ideas, without becoming involved in the details. Hopefully, this discussion will provide a helpful guide to the intuition needed for database design.
As we saw in Chapter 1, redundant data tends to inflate the size of a database, which can be a very serious problem for medium to large databases. Moreover, redundancy can lead to several types of anomalies, as discussed earlier. To understand the problems that can arise from redundancy, we need to take a closer look at what redundancy means.
Let us begin by observing that the attributes of a table scheme can be classified into three groups:
  • Attributes used strictly for identification purposes
  • Attributes used strictly for informational purposes
  • Attributes used for both identification and informational purposes
For example, consider the table scheme:
{PubID,PubName,PubPhone,YearFounded}
In this scheme, PubID is used strictly for identification purposes. It carries no informational content. On the other hand, YearFounded is strictly for informational purposes in this context. It gives the year that the publishing company was founded, but is not required for identification purposes.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Redundancy
As we saw in Chapter 1, redundant data tends to inflate the size of a database, which can be a very serious problem for medium to large databases. Moreover, redundancy can lead to several types of anomalies, as discussed earlier. To understand the problems that can arise from redundancy, we need to take a closer look at what redundancy means.
Let us begin by observing that the attributes of a table scheme can be classified into three groups:
  • Attributes used strictly for identification purposes
  • Attributes used strictly for informational purposes
  • Attributes used for both identification and informational purposes
For example, consider the table scheme:
{PubID,PubName,PubPhone,YearFounded}
In this scheme, PubID is used strictly for identification purposes. It carries no informational content. On the other hand, YearFounded is strictly for informational purposes in this context. It gives the year that the publishing company was founded, but is not required for identification purposes.
Consider also the table scheme:
{Title,PubID,AuID,PageCount,CopyrightDate}
In this case, if we assume that there is only one book of a given title published by a given publisher and written by a given author, then {Title,PubID,AuID} is a key. Hence, each of these attributes is used (at least in part) for identification. However, Title is also an informational attribute.
I should hasten to add that these classifications are somewhat subjective and depend upon the assumptions made about the entity class. Nevertheless, this classification does provide a useful intuitive framework.
We can at least pin down the strictly informational attributes a bit more precisely by making the following observation. The sign that an attribute is being used (at least in part) for identification purposes is that it is part of some key. Thus, an attribute that is not part of any key is being used, in that table scheme, strictly for informational purposes. Let us call such an attribute a
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Normal Forms
Those who make a study of database design have identified a number of special forms, properties, or constraints that a table scheme may possess, in order to achieve certain desired goals, such as minimizing redundancy. These forms are called normal forms. There are six commonly recognized normal forms, with the inspired names:
  • First normal form (1NF)
  • Second normal form (2NF)
  • Third normal form (3NF)
  • Boyce Codd normal form (BCNF)
  • Fourth-normal form (4NF)
  • Fifth normal form (5NF)
We will consider the first four of these normal forms, but only informally. Each of these normal forms is stronger than its predecessors. Thus, for instance, a table scheme that is in third normal form is also in second normal form. While it is generally desirable for the table schemes in a database to have a high degree of normalization, as we will see in this chapter, the situation is not as simple as it may seem.
For instance, requiring that all table schemes be in BCNF may cause some loss of information about the various relationships between the table schemes. In general, it is possible to manipulate the data to achieve third normal form for all table schemes, but this may turn out to be far more work than it is worth.
The plain fact is that forcing all table schemes to be in a particular normal form may require some compromises. Each individual situation (database) must be examined on its own merit. It is impossible to make general rules that apply in all situations.
The process of changing a database design to produce table schemes in normal form is called normalization .
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
First Normal Form
First normal form is very simple. A table scheme is said to be in first normal formif the attribute values are indivisible. To illustrate, we considered in Chapter 1 the question of including all the authors of a book in a single attribute, called Authors. Here is an example entity:
ISBN = 0-55-123456-9
Title = Main Street
Authors = Jones, H. and Smith, K.
Publisher = Small House
Since the table scheme in this case allows more than one author name for the Authors attribute, the scheme is not in first normal form. Indeed, one of the obvious problems with the Authors attribute is that it is impossible to sort the data by individual author name. It is also more difficult to, for instance, prepare a mailing label for each author, and so on.
Attributes that allow only indivisible values are said to be scalar attributes or atomic attributes. By contrast, an attribute whose values can be, for example, a list of items (such as a list of authors) is said to be a structured attribute . Thus, a table scheme is in first normal form if all of its attributes are atomic. Good database design almost always requires that all attributes be atomic, so that the table scheme is in first normal form.
In general, making the adjustments necessary to ensure first normal form is not hard, and it is a good general rule that table schemes should be put in first normal form. However, as with the other normal forms (and even more so the higher up we go) each situation must be considered on its own merits. For instance, a single field might be designed to hold a street address, such as "1333 Bessemer Street." Whether the house number and the street name should be separated into distinct attributes is a matter of context. Put another way, whether a street address is atomic depends upon the context. If there is reason to manipulate the street numbers apart from the street names, then they should certainly constitute their own attribute. Otherwise, perhaps not.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Functional Dependencies
Before we can discuss the other normal forms, we need to discuss the concept of functional dependency , which is used to define these normal forms. This concept is quite simple, and we have actually been using it for some time now. As an example, we have remarked that, for the Publishers table scheme, the PubName attribute depends completely on the PubID attribute. (More properly, we should say that the value of the PubName attribute depends completely on the value of the PubID attribute, but the earlier shorthand is convenient.) Thus, we can say that the functional dependency from PubID to PubName, written:
PubID PubName
holds for the Publishers table scheme. This can be read "PubID determines PubName" or "PubName depends on PubID."
More generally, suppose that {A1,...,Ak} are attributes of a table scheme and that {B1,...,Bn} are also attributes of the same table scheme. We do not require that the Bs be different from the As. Then the attributes B1,...,Bn depend on the attributes A1,...,Ak, written:
{A1,...,Ak} {B1,...,Bn}
if the values of A1,...,Ak completely determine the values of B1,...,Bn. Our main interest is when there is only one attribute on the right:
{A1,...,Ak} {B}
For instance, it is probably safe to say that:
{PubName,PubPhone} {PubID}
which is just another way of saying that there is only one publisher with a given name and phone number (including area code).
It is very important to understand that a functional dependency means that the attributes on the left completely determine the attributes on the right for now and for all time to come, no matter what additional data may be added to the database. Thus, just as the concept of a key relates to entity classes (table schemes) rather than individual entity sets (tables), so does functional dependency. Every table scheme has its set of associated functional dependencies, which are based on the meaning of the attributes.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Second Normal Form
Intuitively, a table scheme T is in second normal form if all of the strictly informational attributes (attributes that do not belong to any key) are attributes of the entities in the table scheme, and not of some other class of entities. In other words, the informational attributes provide information specifically about the entities in this entity class and not about some other entities.
Let us illustrate with an example. Consider a simplified table scheme designed to store house addresses. One possibility is:
{City,Street,HouseNumber,HouseColor,CityPopulation}
The CityPopulation attribute is out of place here because it is an attribute of cities, not house addresses. More specifically, CityPopulation is strictly an informational attribute (not for identification of houses), but it gives information about cities, not house addresses. Thus, this table scheme is not in second normal form.
We can be a little bit more formal about the meaning of second normal form as follows. Referring to the previous example, we have the dependency:
{City} {CityPopulation}
where CityPopulation does not belong to any key, and where City is a proper subset of a key, namely, the key {City, Street, HouseNumber}. (By proper subset, we mean a subset that is not the whole set.)
A table scheme is in 2NF if it is not possible to have a dependency of the form:
{A1,...,Ak} {B}
where B does not belong to any key (is strictly informational) and {A1,...,Ak} is a propersubset of some key, and thus does not identify the entities of this entity class, but rather identifies the entities of some other entity class.
Let us consider another example of a table scheme that is not in second normal form.
Consider the following table scheme, and assume for the purposes of illustration that, while there may be many books with the same title, no two of them have the same publisher and author:
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Third Normal Form
Second normal form is good, but we can do better. We have seen that if a table scheme is in second normal form, then no strictly informational attribute depends on a proper subset of a key. However, there is another undesirable possibility. Let us illustrate with an example.
Consider the following table scheme and assume, for the purposes of illustration, that no two books with the same title have the same publisher:
{Title,PubID,PageCount,Price}
The only key for this table scheme is {Title,PubID}. Both PageCount and Price are informational attributes only.
Now, let us assume that each publisher decides the price of its books based solely on the page count. First, we observe that this table is in second normal form. To see this, consider the proper subsets of the key. These are:
{Title} and {PubID}
But none of the dependencies:
{Title} {PageCount}
{Title} {Price}
{PubID} {PageCount}
{PubID} {Price}
hold for this table scheme. After all, knowing the title does not determine the book, since there may be many books of the same title, published by different publishers. Hence, the table is in second normal form.
It is also not correct to say that:
{PageCount} {Price}
holds, because different publishers may use different price schemes based on page count. In other words, one publisher may price books over 1,000 pages at one price, whereas another may price books over 1,000 pages at a different price. However, it is true that:
{PubID,PageCount} {Price}
holds. In other words, here we have an informational attribute (Price) that depends not on a proper subset of a key, but on a proper subset of a key (PubID) together with another informational attribute (PageCount).
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Boyce-Codd Normal Form
It is possible to find table schemes that are in third normal form, but still have redundancy. Here is an example.
Consider the table scheme {City,StreetName,ZipCode}, with dependencies:
{City,StreetName} {ZipCode}
and:
{ZipCode} {City}
(Although in real life, a zip code may be shared by two different cities, we will assume otherwise for the purposes of illustration.) This table scheme is in third normal form. To see this, observe that the keys are {City,StreetName} and {ZipCode,StreetName}. Hence, no attribute is strictly informational, and there is nothing to violate third normal form.
On the other hand, consider Table 4-4. We can fill in the blank city name because {ZipCode}{City}.
Table 4-4: A table with dependencies
City
StreetName
ZipCode
Los Angeles
Hollywood Blvd
95000
Vine St
95000
The problem here is with the dependency:
{ZipCode}{City}
which does not violate third normal form because, as we have mentioned, {City} is not strictly informational.
The previous example gives us the idea to strengthen the condition in the definition of third normal form by dropping the requirement that B be strictly informational. Thus, we can define our last, and strongest, normal form. A table scheme is in Boyce-Codd normal form if it is not possible to have a dependency of the form:
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Normalization
Content preview·Buy PDF of this chapter|