Preface

This book began life as a comparatively short chapter in a book called Database in Depth: Relational Theory for Practitioners (O’Reilly, 2005). That book was superseded by SQL and Relational Theory: How to Write Accurate SQL Code (O’Reilly, 2009), where the design material, since it was somewhat tangential to the main theme of the book, ceased to be a chapter as such and became a (somewhat longer) appendix instead. I subsequently began work on a second edition of this latter book.[1] During the course of that work, I found there was so much that needed to be said on the subject of design that the appendix threatened to grow out of all proportion to the rest of the book. Since the topic was, as I’ve indicated, rather out of line with the major emphasis of that book anyway, I decided to cut the Gordian knot and separate the material out into a book of its own: the one you’re looking at right now.

Three points arise immediately from the foregoing:

  • First, the present book does assume you’re familiar with material covered in the SQL and Relational Theory book (in particular, it assumes you know exactly what relations, attributes, and tuples are). I make no apology for this state of affairs, however, since the present book is aimed at database professionals and database professionals ought really to be familiar with most of what’s in that earlier book, anyway.

  • Second, the previous point notwithstanding, there’s unavoidably a small amount of overlap between this book and that earlier book. I’ve done my best to keep that overlap to a minimum, however.

  • Third, there are, again unavoidably, many references in this book to that earlier one. Now, most references in this book to other publications are given in full, as in this example:

    Ronald Fagin: “Normal Forms and Relational Database Operators,” Proc. 1979 ACM SIGMOD Int. Conf. on Management of Data, Boston, Mass. (May/June 1979).

    In the case of references to the earlier book in particular, however, from this point forward I’ll give them in the form of the abbreviated title SQL and Relational Theory alone. What’s more, I’ll take that abbreviated title to mean the second edition specifically (where it makes any difference).

Actually I’ve published several short pieces over the years, in one place or another, on various aspects of design theory, and the present book is intended among other things to preserve the good parts of those earlier writings. But it’s not just a cobbling together of previously published material, and I sincerely hope it won’t be seen as such. For one thing, it contains much new material. For another, it presents a more coherent, and I think much better, perspective on the subject as a whole (I’ve learned a lot myself over the years!). Indeed, even when some portion of the text is based on some earlier publication, the material in question has been totally rewritten and, I trust, improved.

Now, there’s no shortage of books on database design; so what makes this one different? In fact I don’t think there’s a book quite like this one on the market. There are many books (of considerably varying quality, in my opinion) on design practice, but those books (again, in my not unbiased opinion) usually don’t do a very good job of explaining the underlying theory. And there are a few books on design theory, too, but they tend to be aimed at theoreticians, not practitioners, and to be rather academic in tone. What I want to do is bridge the gap; in other words, I want to explain the theory in a way that practitioners should be able to understand, and I want to show why that theory is of considerable practical importance. What I’m not trying to do is be exhaustive; I don’t want to discuss the theory in every last detail, I want to concentrate on what seem to me the important parts (though, naturally, my treatment of the parts I do cover is meant to be precise and accurate, as far as it goes). Also, I’m aiming at a judicious blend of the formal and the informal; in other words, I’m trying to provide a gentle introduction to the theory, so that:

  1. You can use important theoretical results to help you actually do design, and

  2. You’ll be able, if you’re so inclined, to go to the more academic texts and understand them.

In the interest of readability, I’ve deliberately written a fairly short book, and I’ve deliberately made each chapter fairly short, too. (I’m a great believer in doling out information in digestible chunks.) Also, every chapter includes a set of exercises (answers to most of which are given in Appendix D at the back of the book), and I do recommend that you have a go at some of those exercises if not all. Some of them are intended to show how to apply the theoretical ideas in practice; others provide (in the answers if not in the exercises as such) additional information on the subject matter, over and above what’s covered in the main body of the text; and still others are meant—for example, by asking you to prove some simple theoretical result—to get you to gain some understanding as to what’s involved in “thinking like a theoretician.” Overall, I’ve tried to give some insight into what design theory is and why it is the way it is.

Prerequisites

My target audience is database professionals: more specifically, database professionals with a more than passing interest in database design. In particular, therefore, I assume you’re reasonably familiar with the relational model, or at least with certain aspects of that model (Chapter 2 goes into more detail on these matters). As already indicated, familiarity with the SQL and Relational Theory book would be a big help. Note: I'd like to mention that I also have a live seminar available based on this book. See www.justsql.co.uk/chris_date/chris_date.htm for further details.

Logical vs. Physical Design

This book is about design theory; by definition, therefore, it’s about logical design, not physical database design. Of course, I’m not saying physical design is unimportant (of course not); but I am saying it’s a distinct activity, separate from and subsequent to logical design. To spell the point out, the “right” way to do design is as follows:

  1. Do a clean logical design first. Then, as a separate and subsequent step:

  2. Map that logical design into whatever physical structures the target DBMS happens to support.[2]

Note, therefore, that the physical design should be derived from the logical design and not the other way around. (Ideally, in fact, the system should be able to derive the physical design “automatically” from the logical design, without the need for human involvement in the process at all.)

To repeat, the book is about design theory. So another thing it’s not about is the various ad hoc design methodologies—entity/relationship modeling and the like—that have been proposed over the years, at one time or another. Of course, I realize that certain of those methodologies are fairly widely used in practice, but the fact remains that they enjoy comparatively little by way of a solid theoretical basis. As a result, they’re mostly beyond the scope of a book like this one. However, I do have a few remarks here and there on such methodologies (especially in Chapter 8 and Chapter 15 and Appendix A).

Acknowledgments

I’d like to thank Hugh Darwen, Ron Fagin, David McGoveran, and Andy Oram for their meticulous reviews of earlier drafts of this book. Each of these reviewers helped correct a number of misconceptions on my part (rather more such, in fact, than I like to think). Of course, it goes without saying that any remaining errors are my responsibility. I’d also like to thank Chris Adamson for help with certain technical questions, and my wife Lindy for her support throughout the production of this book, as well as all of its predecessors.

C. J. Date

Healdsburg, California

2012



[1] Now (2012) available from O’Reilly.

[2] DBMS = database management system. Note that there’s a logical difference between a DBMS and a database! Unfortunately, the industry very commonly uses the term database when it means either some DBMS product, such as Oracle, or the particular copy of such a product that happens to be installed on a particular computer. I do not follow that usage in this book. The problem is, if you call the DBMS a database, what do you call the database?

Get Database Design and Relational Theory now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.