We saw in Chapter 1 that a relation variable (relvar for short) is a variable whose permitted values are relations, and that it's specifically relvars, not relations, that are the target for INSERT, DELETE, and UPDATE operations. We also saw that INSERT, DELETE, and UPDATE are all just shorthand for certain relational assignments. I remind you too that (a) if R is a relvar and r is a relation to be assigned to R, then R and r must be of the same relation type, and (b) the terms heading, body, attribute, tuple, cardinality, and degree, formally defined in Chapter 3 for relations, can all be interpreted in the obvious way to apply to relvars as well. Now it's time to take a closer look at these matters. As a basis for examples, I'll use the following Tutorial D definitions for the base relvars in the suppliers-and-parts database:
VAR S BASE RELATION { SNO SNO, SNAME NAME, STATUS INTEGER, CITY CHAR } KEY { SNO } ; VAR P BASE RELATION { PNO PNO, PNAME NAME, COLOR COLOR, WEIGHT WEIGHT, CITY CHAR } KEY { PNO } ; VAR SP BASE RELATION { SNO SNO, PNO PNO, QTY QTY } KEY { SNO, PNO } FOREIGN KEY { SNO } REFERENCES S FOREIGN KEY { PNO } REFERENCES P ;
The first point I want to stress is that, regardless of what syntax we use to express it, relational assignment is a set-level operation. (In fact, all operations in the relational model are set-level, as we'll see in Chapter 5.) Thus, INSERT inserts a set of tuples into the target relvar; DELETE deletes a set of tuples from the target relvar; and UPDATE updates a set of tuples in the target relvar. Now, it's true that we often talk in terms of (for example) updating some individual tuple as such, but you need to understand that:
Such talk really means the set of tuples we're updating just happens to have cardinality one.
What's more, updating a set of tuples of cardinality one sometimes isn't possible anyway.
For example, suppose relvar S is subject to the integrity constraint (see Chapter 6) that suppliers S1 and S4 are always in the same city. Then any "single-tuple UPDATE" that tries to change the city for just one of those two suppliers will necessarily fail. Instead, we must update them both at the same time, perhaps like this (SQL):
UPDATE S SET CITY = 'New York' WHERE S.SNO = SNO('S1') OR S.SNO = SNO('S4') ;
What's being updated here is, obviously enough, a set of two tuples.
Note
Here, for interest, is the same update expressed in Tutorial D (it looks very similar, as you can see):
UPDATE S WHERE SNO = SNO('S1') OR SNO = SNO('S4') ( CITY := 'New York' ) ;
One consequence of the foregoing is that there's nothing in the relational model that resembles SQL's "positioned updates" (that is, UPDATE or DELETE "WHERE CURRENT OF cursor"), because those operations are tuple-level, not set-level, by definition. They do happen to work in today's products, most of the time, but that's because those products aren't very good at supporting integrity constraints. If the products were to improve in that regard, those "positioned updates" might not work any more; that is, applications that succeed today might fail tomorrow—not a very desirable state of affairs, it seems to me.
Now I need to 'fess up to something. The fact is, to talk as I've been doing of "updating a tuple"—or set of tuples, rather—is very imprecise (not to say sloppy) anyway. If V is subject to update, then V must be a variable by definition, not a value, and tuples, like relations, are values and can't be updated, again by definition. What we really mean when we talk of (for example) updating tuple t1 to t2, within some relvar R, is that we're replacing tuple t1 in R by another tuple t2. And that kind of talk is still sloppy! What we really mean is that we're replacing the relation r1 that's the original value of R by another relation r2. And what exactly is relation r2 here? Well, let s1 and s2 be relations containing just tuple t1 and tuple t2, respectively; then r2 is (r1 MINUS s1) UNION s2. In other words, "updating tuple t1 to t2 in relvar R" can be thought of as first deleting t1 and then inserting t2—if (despite everything I've been saying) I might be permitted to talk in terms of deleting and inserting individual tuples in this loose fashion.
In the same kind of way, it doesn't really make sense to talk in terms of "updating attribute A within tuple t"—or within relation r, or even within relvar R. Of course, we do it anyway, because it's convenient (it saves a lot of circumlocution); but it's like that business of user-friendly terminology I discussed in Chapter 1—it's OK to do it only if we all understand that such talk is only an approximation to the truth, and indeed it tends to obscure the essence of what's really going on.
I explained the basic idea of candidate keys in Chapter 1, but now I want to make the concept more precise. Here's a definition:
Definition: Let K be a subset of the heading of relvar R. Then K is a candidate key (or just key for short) for R if and only if it possesses both of the following properties:
Note
In accordance with usual practice, throughout this book I take statements of the form "B is a subset of A" and "A is a superset of B" to include the possibility that A and B might be equal. If I want to exclude that possibility, I'll talk explicitly in terms of proper subsets and supersets.
Now, the uniqueness property is self-explanatory, but I need to say a bit more about the irreducibility property. Consider relvar S and the set of attributes {SNO,CITY}—let's call it SK—which is certainly a subset of the heading of S that has the uniqueness property (no relation that's a possible S value ever has two distinct tuples with the same SK value). But it doesn't have the irreducibility property, because we could discard the CITY attribute and what's left, the set {SNO}, would still have the uniqueness property. So we don't regard SK as a key, because it's "too big." By contrast, {SNO} is irreducible, and it's a key.
Why do we want keys to be irreducible? One reason is that if we were to specify a "key" that was not irreducible, the DBMS wouldn't be able to enforce the uniqueness constraint properly. For example, suppose we lied and told the DBMS that {SNO,CITY} was a key. Then it couldn't enforce the constraint that supplier numbers are "globally" unique; instead, it could enforce only the weaker constraint that supplier numbers are "locally" unique, in the sense that they're unique within a given city. So this is one reason (not the only one) why we require keys not to include any attributes that aren't needed for unique identification purposes.
Now, all of the relvars we've seen so far have had just one key. Here, by contrast, are some with two or more (they're meant to be self-explanatory). Note the overlapping nature of the keys in the second and third examples.
VAR TAX_BRACKET BASE RELATION { LOW MONEY, HIGH MONEY, PERCENTAGE INTEGER } KEY { LOW } KEY { HIGH } KEY { PERCENTAGE } ; VAR ROSTER BASE RELATION { DAY DAY_OF_WEEK, TIME TIME_OF_DAY, GATE GATE, PILOT NAME } KEY { DAY, TIME, GATE } KEY { DAY, TIME, PILOT } ; VAR MARRIAGE BASE RELATION { SPOUSE_A NAME, SPOUSE_B NAME, DATE_OF_MARRIAGE DATE } /* assume no polygamy and no couple marrying */ /* each other more than once ... */ KEY { SPOUSE_A, DATE_OF_MARRIAGE } KEY { DATE_OF_MARRIAGE, SPOUSE_B } KEY { SPOUSE_B, SPOUSE_A } ;
I'll close this section with a few miscellaneous points. First, note that the key concept applies to relvars, not relations. Why? Because to say something is a key is to say a certain integrity constraint is in effect—a certain uniqueness constraint, to be specific—and integrity constraints apply to variables, not values. (Integrity constraints constrain updates, and updates apply to variables, not values. See Chapter 6 for further discussion.)
Second, if R is a relvar, then R certainly does have at least one key. The reason is that every possible value of R is a relation and therefore contains no duplicate tuples, by definition; at the very least, therefore, the combination of all of the attributes of R certainly has the uniqueness property.[*] Thus, either that combination also has the irreducibility property or there's some proper subset of that combination that does. Either way, there's something that's both unique and irreducible.
Third, note that key values are tuples. In the case of relvar S, for example, with its sole key {SNO}, the value of that key for some specific tuple—say, that for supplier S1—is:
TUPLE { SNO SNO('S1') }
(Recall from Chapter 3 that every subset of a tuple is a tuple in turn.) Of course, in practice we would usually say, informally, that the key value in this example is just S1—or SNO('S1'), rather—but it really isn't.
Following on from the previous point: it should now be clear that the key concept, like so much else in the relational model, relies on the fundamental concept of tuple equality . That is, in order to enforce the uniqueness constraint, we need to be able to tell when two key values are equal, and that's precisely a matter of tuple equality—even when, as in the case of relvar S, the tuples in question are of degree one and "look like" simple scalar values.
My final point has to do with the notion of functional dependency. I don't want to get into a lot of detail regarding that concept here—I'll do that in Chapter 7—but you're probably familiar with it anyway; all I want to do here is call your attention to the following. Let K be a key for relvar R, and let A be any attribute of R. Then R necessarily satisfies the functional dependency:
K → A
To elaborate briefly: in general, the functional dependency K → A means that whenever two tuples of R have the same value for K, they also have the same value for A. But if two tuples have the same value for K, where K is a key, then by definition they're the very same tuple!—and so they must have the same value for A. In other words, loosely: we always have "functional dependency arrows" out of keys to everything else in the relvar. Again, I'll revisit these ideas in Chapter 7.
I explained the basic idea of foreign keys in Chapter 1, but here's a precise definition (note the reliance on tuple equality once again):
Definition: Let R1 and R2 be relvars, not necessarily distinct, and let K be a key for R1. Let FK be a subset of the heading of R2 that, possibly after some attribute renaming, involves exactly the same attributes as K. Then FK is a foreign key if and only if, at all times, every tuple in R2 has an FK value that is equal to the K value in some (necessarily unique) tuple in R1 at the time in question.
As we know, in the suppliers-and-parts database, {SNO} and {PNO} are foreign keys in relvar SP, referencing the sole candidate key—in fact, the primary key—in relvar S and relvar P, respectively. Here's another example:
VAR EMP BASE RELATION { ENO ENO, ..., MNO ENO, ... } KEY { ENO } FOREIGN KEY { RENAME ( MNO AS ENO ) } REFERENCES EMP ;
Attribute MNO here denotes the employee number of the manager of the employee identified by ENO; thus, the "referencing relvar" (R2 in the definition) and the "referenced relvar" (R1 in the definition) in this example are one and the same. For example, the EMP tuple for employee E3 might include an MNO value of E2, which constitutes a reference to the EMP tuple for employee E2. But foreign key values, like candidate key values, are tuples ; conceptually, therefore, we have to rename the MNO attribute in the foreign key specification, in order for the tuple equality comparison to be valid. (What tuple equality comparison? The one that's implicit in the process of checking the foreign key constraint—recall that tuples must certainly be of the same type if they're to be tested for equality, and "same type" means they must have the same attribute names.)
As an aside, I should mention that the relational model as originally formulated required foreign keys to match not just some candidate key but, very specifically, the primary key in the referenced relvar. However, I gave my reasons in Chapter 1 for not insisting that some candidate key always be chosen and made primary; accordingly, therefore, I don't want to insist that foreign keys always match primary keys specifically. (I agree with SQL on this one.)
Now, SQL supports not just foreign keys as such but also certain associated referential actions such as CASCADE (which can be specified as part of either an ON DELETE clause or an ON UPDATE clause). For example, the CREATE TABLE statement for shipments might include the following:
FOREIGN KEY ( SNO ) REFERENCES S ( SNO ) ON DELETE CASCADE
Given this specification, an attempt to delete a specific supplier will cascade to delete all shipments for that supplier as well. I mention this point for the following reasons:
First, such specifications might be useful in practice, but they aren't part of the relational model as such.
But that's not necessarily a problem! The relational model is the foundation of the database field, but it's only the foundation. There's no reason why additional features shouldn't be built on top of, or alongside, that foundation—just so long as those additions don't violate the prescriptions of the model, of course (and are in the spirit of the model and can be shown to be useful, I suppose I should add). To elaborate:
Type theory provides the most obvious example. We saw in Chapter 2 that "types are orthogonal to tables," but we also saw that full and proper type support in relational systems is highly desirable, to say the very least.
By way of a second example, the relational model has almost nothing to say about recovery and concurrency controls, but this fact obviously doesn't mean that relational systems shouldn't provide such controls. (Actually it could be argued that the relational model does say something about such matters implicitly, because it relies on the DBMS to implement updates properly and not to lose data—but it doesn't prescribe anything specific.)
One final remark to close this section: I've discussed foreign keys because they're of considerable pragmatic importance, and also because they're part of the model as originally defined. But I think I should stress the point that they aren't truly fundamental—they're really just shorthand for certain integrity constraints that are commonly required in practice, as we'll see in Chapter 6.[*] (In fact, much the same could be said for candidate keys as well, but in that case the practical benefits of providing a shorthand are overwhelming.)
A view , also called a virtual relvar, is a relvar that doesn't have separate existence in its own right but looks to the user as if it did. Here's a definition:
Definition: A view or virtual relvar V is a relvar whose value at time t is the result of evaluating a certain relational expression at that time t. The expression in question is specified when V is defined and must mention at least one relvar.
The following are a couple of examples, "London suppliers" and "non-London suppliers" (Tutorial D on the left, SQL on the right):
VAR LS VIRTUAL | CREATE VIEW LS AS ( S WHERE CITY = 'London' ) ; | ( SELECT S.* | FROM S | WHERE S.CITY = 'London' ) ; VAR NLS VIRTUAL | CREATE VIEW NLS AS ( S WHERE CITY ≠ 'London' ) ; | ( SELECT S.* FROM S | | WHERE S.CITY <> 'London' ) ;
The parentheses in all of these examples are unnecessary but not wrong. I include them for clarity.
To repeat, views are meant to look to the user as if they had their own separate existence; in other words, they're supposed to "look and feel" just like base relvars so far as the user is concerned. In particular, the user should be able to operate on views as if they were base relvars, and the DBMS should be able to map those user operations into suitable operations on the base relvars in terms of which the views are ultimately defined. I say "ultimately" here because (of course) one thing we can do, if views really do behave just like base relvars, is define further views on top of them, as in this SQL example:
CREATE VIEW LS_STATUS AS ( SELECT LS.SNO, LS.STATUS FROM LS ) ;
Mapping read-only operations is straightforward. For example, suppose we issue this SQL query on view LS:
SELECT LS.SNO FROM LS WHERE LS.STATUS > 10
First, the DBMS replaces the reference to the view in the FROM clause by the expression that defines that view, yielding:
SELECT LS.SNO FROM ( SELECT S.* FROM S WHERE S.CITY = 'London' ) AS LS WHERE LS.STATUS > 10
This expression can now be simplified to:
SELECT S.SNO FROM S WHERE S.CITY = 'London' AND S.STATUS > 10
The reason the foregoing process works is precisely because of the closure property of the relational algebra. Closure implies, among other things, that wherever we're allowed to have the name of something—for example, in a query—we can always have a more general expression that evaluates to a thing of the appropriate type.[*] In the FROM clause, for example, we can have an SQL table name; thus we can also have a more general SQL table expression, and that's why we're allowed to substitute the expression that defines the view LS for the name LS in the example.
By the way, it's worth mentioning that the foregoing process didn't always work in early versions of SQL (to be specific, in versions of the SQL standard prior to 1992), because those early versions failed to support closure properly. As a result, certain innocuous-looking queries against certain innocuous-looking tables (actually views) failed—and failed, moreover, in ways that were hard to explain. Here's a simple example:
CREATE VIEW V AS ( SELECT S.CITY, SUM ( S.STATUS ) AS ST FROM S GROUP BY S.CITY ) ; SELECT V.CITY FROM V WHERE V.ST > 25
This example failed in the SQL standard prior to 1992. And although the standard has now been fixed, it doesn't follow that all of the products have! And indeed there's at least one major product that still hasn't, at the time of writing (early 2005).
I turn now to update operations. Before I get into specifics, I want to look at the London and non-London supplier views again (and now I'll switch to Tutorial D):
VAR LS VIRTUAL ( S WHERE CITY = 'London' ) ; VAR NLS VIRTUAL ( S WHERE CITY ≠ 'London' ) ;
The important point here is as follows: instead of S being the base relvar and LS and NLS views, LS and NLS could be base relvars and S could be a view, like this:
VAR LS BASE RELATION { SNO SNO, SNAME NAME, STATUS INTEGER, CITY CHAR } KEY { SNO } ; VAR NLS BASE RELATION { SNO SNO, SNAME NAME, STATUS INTEGER, CITY CHAR } KEY { SNO } ; VAR S VIRTUAL ( LS UNION NLS ) ;
Note
In order to achieve complete equivalence, we would also have to specify certain constraints—in particular, constraints to the effect that every CITY value in LS is London and every CITY value in NLS is not London—but I omit such details here. See Chapter 6 for further discussion of such matters.
The message of this example is that, to a very large extent, which relvars are base and which virtual is arbitrary—from which it follows that there must be no arbitrary and unnecessary distinctions between base and virtual relvars. This state of affairs is referred to as The Principle of Interchangeability (of base and virtual relvars). Here are some implications:
Like base relvars, views are subject to integrity constraints. (We usually think of integrity constraints as applying just to base relvars, but The Principle of Interchangeability shows that this position isn't really tenable.)
In particular, views have candidate keys (and so I should perhaps have included some key specifications in my view examples prior to this point; Tutorial D permits such specifications, but SQL doesn't). They might also have foreign keys, and foreign keys might refer to them.
I didn't mention this point in Chapter 1, but the "entity integrity" rule is supposed to apply specifically to base relvars, not views. It thereby violates The Principle of Interchangeability. (Of course, I reject that rule anyway, because it has to do with nulls.)
Many SQL products, and the SQL standard, provide some kind of "row ID" feature. If that feature applies to base tables and not to views—which in practice is quite likely—it violates The Principle of Interchangeability.[*] Of course, row IDs as such aren't part of the relational model, but that fact in itself doesn't mean they shouldn't be supported. But I observe as an important aside that if those row IDs are regarded as some kind of object ID in the object-oriented sense (as they are, most unfortunately, in the SQL standard, as well as in most of the major SQL products), then they're definitely prohibited! Object IDs are effectively pointers, and the relational model explicitly prohibits pointers.
And, to revert to the main point of this discussion, we must be able to update views—because if we can't, then that fact in itself constitutes the clearest possible violation of The Principle of Interchangeability.
As you probably know, SQL's support for this requirement is quite weak, both in the standard and in the major commercial products. It does at least typically include support for updates on views defined as simple restrictions and/or projections of a single underlying base relvar (though even here there are some problems). For example, consider the following view (which is essentially identical to one we saw in Chapter 1):
CREATE VIEW SST_PARIS AS ( SELECT S.SNO, S.STATUS FROM S WHERE S.CITY = 'Paris' ) ;
This view is a projection of a restriction of base table S, and so we might, for example, be able to perform the following DELETE on it:
DELETE FROM SST_PARIS WHERE SST_PARIS.STATUS > 15 ;
This DELETE maps to:
DELETE FROM S WHERE S.CITY = 'Paris' AND S.STATUS > 15 ;
But few products provide support for updating views that are much more sophisticated than this one.
Unfortunately, I'm now straying into an area in which there's still a certain amount of controversy. My own opinion is that the view updating problem has largely been solved (that is, the theory exists); however, not everybody agrees with me, and in any case a detailed discussion of the subject requires rather more background than it's possible to include in a book of this nature. Thus, I'm afraid the best I can do here is refer you to another book—Databases, Types, and the Relational Model: The Third Manifesto, Third Edition (Addison-Wesley, 2006), by C. J. Date and Hugh Darwen, where the subject is examined in depth—if you want more specifics.
There are a few more things I need to say in order to finish up this section. First of all, it's well known, but worth mentioning anyway, that views serve two rather different purposes:
The user who actually defines view V is, obviously, aware of the expression X in terms of which V is defined. That user can use the name V wherever the expression X is intended, but such uses are basically just shorthand.
By contrast, a user who's merely informed that V exists and is available for use is supposed (at least ideally) not to be aware of the expression X; to that user, in fact, V is supposed to look and feel just like a base relvar, as we've already seen. And it's this second use of views that's the really important one, and the one I've been concentrating on, tacitly, throughout this section so far.
Second, when I explained what a view was at the beginning of this section, I said the relational expression that defined the view had to mention at least one relvar. Why? Because if it doesn't, the "virtual relvar" won't be a relvar at all!—I mean, it won't be a variable, and it certainly won't be updatable. Instead, it'll be a relation constant , or what we might call a "relcon." For example (to invent some syntax on the fly):
CONST PERIODIC_TABLE ( RELATION { TUPLE { ELEMENT 'Hydrogen', SYMBOL 'H' , ATOMICNO 1 }, TUPLE { ELEMENT 'Helium' , SYMBOL 'He', ATOMICNO 2 }, ... TUPLE { ELEMENT 'Uranium' , SYMBOL 'U' , ATOMICNO 92 } } ) ;
While it certainly might be desirable to provide some kind of "relcon" functionality along the foregoing lines, I don't think we should think of such things as relvars. I don't think it helps the cause of understanding to pretend that constants are variables.
Third, an unfortunate terminological clash is arising as I write, certainly in the academic world, and to some extent in the commercial world also. Recall from Chapter 1 that a view can be thought of as a derived relvar. Well, there's another kind of derived relvar, too, called a snapshot . As the name might suggest, a snapshot, although it's derived, is real, not virtual—meaning it's represented not just by its definition in terms of other relvars but also, at least conceptually, by its own separate copy of the data. For example (to invent some syntax again):
VAR LSS SNAPSHOT ( S WHERE CITY = 'London' ) REFRESH EVERY DAY ;
Defining a snapshot is just like executing a query, except that:
The result of the query is saved in the database under the specified name (LSS in the example) as a read-only relvar (read-only, that is, apart from the periodic refresh; see the next bullet item).
Periodically (EVERY DAY in the example) the snapshot is refreshed—its current value is discarded, the query is executed again, and the result of that new execution becomes the new snapshot value.
In the example, therefore, snapshot LSS represents the data as it was at most 24 hours ago.
Snapshots are important in data warehouses, distributed systems, and many other contexts. In all cases, the rationale is that many applications can tolerate, or might even require, data "as of " some particular point in time. Reporting and accounting applications are a case in point; such applications typically require the data to be frozen at an appropriate moment (for example, at the end of an accounting period), and snapshots allow such freezing to occur without locking out other applications.
So far, so good. The problem is that snapshots have come to be known (at least in some circles) not as snapshots at all but as materialized views. But snapshots aren't views! Indeed, the whole point about views, at least so far as the relational model is concerned, is that they aren't materialized; as we've seen, operations on views are implemented by mapping them into suitable operations on the underlying base relvars. Thus, "materialized view" is simply a contradiction in terms. Worse yet, the unqualified term view is now often taken to mean a "materialized view" specifically—again, at least in some circles—and so we're in danger of no longer having a good term to mean a view in the original sense. In this book, of course, I do use the term view in its original sense, but be warned that it doesn't always have that meaning elsewhere. Caveat lector.
Now we come to what in many ways is the most important part of this chapter. The essence of it is this: there's another way to think about relvars. I mean, most people think of relvars as if they were just files in the traditional computing sense—rather abstract files, perhaps (maybe disciplined would be a better word than abstract), but files nonetheless. But there's a different way to look at them, a way that I believe can lead to a much deeper understanding of what's really going on. It goes like this.
Consider the suppliers relvar S. Like all relvars, that relvar is supposed to represent some portion of the real world. In fact, I can be more precise: the heading of that relvar represents a certain predicate , meaning it's a kind of generic statement about some portion of the real world (it's generic because it's parameterized, as I'll explain in a moment). The predicate in question looks like this:
Supplier SNO is under contract, is named SNAME, has status STATUS, and is located in city CITY.
This predicate is the intended interpretation—in other words, the meaning, also called the intension (note the spelling)—for relvar S.
You can think of a predicate as a truth-valued function. Like all functions, it has a set of parameters, it returns a result when it's invoked, and (because it's truth-valued) that result is either TRUE or FALSE. In the case of the predicate just shown, for example, the parameters are SNO, SNAME, STATUS, and CITY (corresponding to the attributes of the relvar, of course), and they stand for values of the applicable types (SNO, NAME, INTEGER, and CHAR, respectively). When we invoke the function—when we instantiate the predicate, as the logicians say—we substitute arguments for the parameters. Suppose we substitute the arguments S1, Smith, 20, and London, respectively. Then we obtain the following proposition:
Supplier S1 is under contract, is named Smith, has status 20, and is located in city London.
In general, a proposition in logic is a statement that's unconditionally either true or false. Here are a couple of examples:
Edward Abbey wrote The Monkey Wrench Gang.
William Shakespeare wrote The Monkey Wrench Gang.
The first of these is true and the second false. Don't fall into the common trap of thinking propositions must always be true ones! However, the ones I'm talking about in connection with relvars are supposed to be true ones specifically, as I now explain:
First, every relvar has an associated predicate, called the relvar predicate for the relvar in question.
Let relvar R have predicate P. Then every tuple t appearing in R at some given time can be regarded as representing a certain proposition p, derived by invoking (or instantiating) P at that time with the attribute values from t as arguments.
And (very important!) we assume by convention that each proposition p that's obtained in this manner evaluates to TRUE.
Hence, given our usual sample value for relvar S, we assume the following propositions all evaluate to TRUE:
Supplier S1 is under contract, is named Smith, has status 20, and is located in city London.
Supplier S2 is under contract, is named Jones, has status 10, and is located in city Paris.
Supplier S3 is under contract, is named Blake, has status 30, and is located in city Paris.
And so on. What's more, we go further: if a certain tuple plausibly could appear in some relvar at some time but in fact doesn't, then we assume the corresponding proposition is false at that time (in other words, we adopt what's called the Closed World Assumption). For example, the tuple:
TUPLE { SNO SNO('S6'), SNAME NAME('Lopez'), STATUS 30, CITY 'Madrid' }
is—let's agree—a plausible supplier tuple but doesn't appear in relvar S at this time, and so we assume it's not the case that the following proposition is true at this time:
Supplier S6 is under contract, is named Lopez, has status 30, and is located in city Madrid.
In other words, a given relvar contains, at any given time, all and only the tuples that represent true propositions (true instantiations of the predicate) at that time.
More terminology: Again, let P be the relvar predicate or intension for relvar R, and let the value of R at some given time be relation r. Then r—or the body of r, to be more precise—constitutes the extension of P at that time. Note, therefore, that the extension varies over time, but the intension does not.
The ideas I've been discussing in this section all extend in a natural way to apply to arbitrary relational expressions. For example, consider this expression, which represents the projection of suppliers on all attributes but CITY:
S { SNO, SNAME, STATUS }
The result contains all tuples of the form:
TUPLE { SNO s, SNAME n, STATUS t }
such that a tuple of the form:
TUPLE { SNO s, SNAME n, STATUS t, CITY c }
currently appears in S for some CITY value c. In other words, the result represents the current extension of a predicate that looks like this:
There exists some city CITY such that supplier SNO is under contract, is named SNAME, has status STATUS, and is located in city CITY.
Observe that this predicate has just three parameters and the corresponding relation (the projection of suppliers on all but CITY) has just three attributes—CITY is not a parameter but what the logicians instead call a bound variable , owing to the fact that it's "quantified" by the phrase there exists some city. (See Appendix A for further explanation of bound variables and quantifiers.) A possibly clearer way of making the same point—that the predicate has just three parameters, not four—is to observe that the predicate in question is logically equivalent to this one:
Supplier SNO is under contract, is named SNAME, has status STATUS, and is located in some city (in other words, somewhere, but we don't know where).
It follows from all of the above that views in particular represent certain predicates. For example, if view SST is defined as follows:
VAR SST VIRTUAL ( S { SNO, SNAME, STATUS } ) ;
then the relvar predicate for SST is precisely:
Supplier SNO is under contract, is named SNAME, has status STATUS, and is located in some city.
There's one last point I want to make here about predicates and propositions. I've said a predicate has a set of parameters. As usual, however, that set might be empty—and if it is, the predicate becomes a proposition! (Certainly it's a statement that's unconditionally either true or false.) In other words, a proposition is a degenerate predicate; all propositions are predicates, but most predicates aren't propositions.
Chapter 2 was called Relations Versus types. However, I wasn't in a position in that chapter to explain the most important difference between those two concepts—but now I am, and I will.
I've shown that the database at any given time can be thought of as a collection of true propositions: for example, the proposition Supplier S1 is under contract, is named Smith, has status 20, and is located in city London. More specifically, I've shown that the argument values appearing in such a proposition (S1, Smith, 20, and London, in the example) are, precisely, the attribute values from the corresponding tuple, and of course each such attribute value is a value of the associated type. It follows that:
Types are sets of things we can talk about;
relations are (true) statements about those things.
In other words, types give us our vocabulary—the things we can talk about—and relations give us the ability to say things about the things we can talk about. (There's a nice analogy here that might help: Types are to relations as nouns are to sentences.) For example, if we limit our attention to suppliers only, for simplicity, we see that:
The things we can talk about are supplier numbers, names, integers, and character strings—and nothing else.
The things we can say are things of the form "The supplier with the specified supplier number is under contract, has the specified name, has the status denoted by the specified integer, and is located in the city denoted by the specified character string"—and nothing else. (Nothing else, that is, except for things that are logically implied by things we can say. For example, given the things we already know we can say about supplier S1, we can also say things like Supplier S1 is under contract, is named Smith, has status 20, and is located in some city—where the city is left unspecified. And if you're thinking that what I've just told you is very reminiscent of, and probably has some deep connection to, relational projection… well, you'd be absolutely right.)
The foregoing state of affairs has at least three important corollaries. To be specific, in order to represent "some portion of the real world" (as I put it in the previous section):
Types and relations are both necessary—without types, we have nothing to talk about; without relations, we can't say anything.
Types and relations taken together are sufficient, as well as necessary—we don't need anything else, logically speaking. (Well, we do need relvars, in order to reflect the fact that the real world changes over time, but we don't need them to represent the situation at any given time.)
Types and relations aren't the same thing. Beware of anyone who tries to pretend they are! In fact, pretending a type is just a special kind of relation is precisely what certain commercial products try to do (though they don't usually talk in such terms)—and I hope it's clear that any product that's founded on such a logical error is doomed to eventual failure. The products I have in mind aren't relational products, of course; typically, they're products that support objects in the object-oriented sense, or products that try somehow to marry such objects and SQL tables. (In fact, at least one of the products I have in mind has indeed already failed.) Further details are beyond the scope of this book, however.
I'd like to wind up this section by offering a slightly more formal perspective on some of what I've been saying. I've said a database can be thought of as a collection of true propositions. In fact, a database, together with the operators that apply to the propositions represented in that database (or to sets of such propositions, rather), is a logical system . And when I say "a logical system," I mean a formal system—like euclidean geometry, for example—that has axioms ("given truths") and rules of inference by which we can prove theorems ("derived truths") from those axioms. Indeed, it was Codd's very great insight, when he first invented the relational model back in 1969, that (despite the name) a database isn't really just a collection of data; rather, it's a collection of facts, or in other words true propositions. Those propositions—the given ones, that is, which is to say the ones represented by the base relvars—are the axioms of the logical system under discussion. The inference rules are essentially the rules by which new propositions can be derived from the given ones; in other words, they're the rules that tell us how to apply the operators of the relational algebra. Thus, when the system evaluates some relational expression (in particular, when it responds to some query), it's really deriving new truths from given ones; in effect, it's proving a theorem!
Once we understand the foregoing, we can see that the whole apparatus of formal logic becomes available for use in attacking "the database problem." In other words, questions such as:
What should the database look like to the user?
What should integrity constraints look like?
What should the query language look like?
How can we best implement queries?
More generally, how can we best evaluate database expressions?
How should results be presented to the user?
How should we design the database in the first place?
(and others like them) all become, in effect, questions in logic that are susceptible to logical treatment and can be given logical answers.
Of course, it goes without saying that the relational model supports the foregoing perception very directly—which is why, in my opinion, that model is rock solid, and "right," and will endure. It's also why, again in my opinion, other "data models" are simply not in the same ballpark. Indeed, I seriously question whether those other "models" deserve to be called models at all, in the same sense that the relational model can be called a model. Certainly most of them are ad hoc to a degree, instead of being firmly founded, as the relational model is, in set theory and predicate logic. I'll expand on these issues in Chapter 8.
The most important part of this chapter is the section having to do with predicates (along with the subsequent section, which discusses the related issue of relations versus types). Basically, every relvar R has an associated predicate P (the relvar predicate for R); P is the intended interpretation or intension for R, and it doesn't change over time. And if the current value of R is relation r, then r represents the current extension of P; it consists of a set of tuples, each of which represents a true proposition, or in other words a true instantiation of P. The extension does change over time. Thus, a database together with its operators can be seen as a logical system.
Here are some other important points from the body of the chapter:
Only relvars are updatable; talk of "updating a tuple" or "updating an attribute" is convenient but sloppy. Updating is always set-at-a-time.
Every relvar has at least one (candidate) key. Keys have the properties of uniqueness and irreducibility. Key values are tuples.
Some relvars have foreign keys. SQL supports certain associated "referential actions," such as CASCADE; such actions might well be useful in practice, but they aren't part of the relational model. What's more, foreign keys themselves aren't truly fundamental, either.
Operations on views are implemented by mapping them into appropriate operations on the underlying base relvars. (The mapping process, which basically works because of closure, is straightforward for read-only operations but less so for update operations.) The Principle of Interchangeability states that there must be no arbitrary and unnecessary distinctions between base and virtual relvars.
"Types are to relations as nouns are to sentences."
Types and relations are both necessary and sufficient to represent any data we like. (At the logical level, that is; of course, other constructs are useful at the physical level, as we all know, but that's because the objectives are different at that level. The physical level is deliberately beyond the purview of the relational model.)
Explain in your own words why remarks like (for example) "This UPDATE operation updates the status for suppliers in London" aren't very precise. Give a replacement for that remark that's as precise as you can make it.
Give definitions for SQL analogs of the TAX_BRACKET, ROSTER, and MARRIAGE relvars from the section "More on Candidate Keys."
In the body of the chapter, I gave one reason why key irreducibility is a good idea. Can you think of any others?
Relvar EMP from the section "More on Foreign Keys" is an example of what's sometimes called a self-referencing relvar. Invent some sample data for that relvar. Does this example lead inevitably to a requirement for null support? (Answer: No, but it does serve to show how seductive the nulls idea can be.) What can be done in this example if nulls are prohibited?
SQL has nothing analogous to Tutorial D's renaming option in its foreign key specifications. Why not?
Can you think of a situation in which two relvars R1 and R2 might each have a foreign key referencing the other?
Investigate any SQL product available to you. What referential actions does that product support? Which ones do you think are useful? Can you think of any others that the product doesn't support but might be useful?
The relational model has nothing to say about triggered procedures (often known simply as triggers). Is this omission a problem? If so, why? If not, why not? Do you think triggered procedures are necessary? Or desirable?
Let view LSSP be defined as follows (SQL):
CREATE VIEW LSSP AS ( SELECT S.SNO, S.SNAME, S.STATUS, SP.PNO, SP.QTY FROM S, SP WHERE S.SNO = SP.SNO AND S.CITY = 'London' ) ;
Here's a query on this view:
SELECT DISTINCT LSSP.STATUS, LSSP.QTY FROM LSSP WHERE LSSP.PNO IN ( SELECT P.PNO FROM P WHERE P.CITY <> 'London' )
What might the query that's actually executed on the underlying base relvars look like?
Investigate any SQL product available to you. Are there any apparently legitimate queries on views that fail in that product? If so, state as precisely as you can which ones they are. What justification does the vendor offer for failing to provide full support? (Note that this exercise asks about queries only, not updates.)
Investigate any SQL product available to you. What view updates does that product support? Be as precise as you can in your answer.
Using either the suppliers-and-parts database or any database you happen to be familiar with, give some further examples to illustrate the point that which relvars are base and which virtual is largely arbitrary.
Investigate any SQL product available to you. In what ways—there will be some!—does that product violate The Principle of Interchangeability?
Distinguish between views and snapshots. Does SQL support snapshots? Does any product that you're aware of?
Let DB be any database you happen to be familiar with and let R be any relvar in DB. What's the predicate for R? Note: The point of this exercise is to get you to apply the fundamental ideas discussed in the body of this chapter to your own data, in an attempt to get you thinking about data in general in such terms. Obviously the exercise has no unique right answer.
Consider views LS and NLS from the section "More on Views." What are the corresponding relvar predicates? Would it make any difference if they weren't views but base relvars instead?
What's the predicate for view LSSP from Exercise 4-13?
A key is a set of attributes, and the empty set is a legitimate set; thus, we could define an empty key to be a key where the set of attributes is empty. Can you think of any uses for such a key?
What's the predicate for a relvar of degree zero? (Does this question even make sense? Justify your answer.)
[*] We can't say the same for SQL tables, of course—SQL tables allow duplicate rows and so might have no key at all.
[*] Precisely for this reason, in fact, explicit support for them is currently omitted from Tutorial D. However, I'm sure an "industrial-strength" version of that language would support them, and I'm taking the liberty in this book of pretending such support is already there.
[*] I would much have preferred to use the more formal term object in this sentence in place of the very vague term thing, but object has become a loaded term in computing contexts.
Get Database in Depth now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.