Database design is probably not the most exciting task in the world, but it’s still important. Before we describe how to go about the design process, let’s look at an example of database design on the run.
Imagine we want to create a database to store student grades for
a university computer science department. We could create a Student_Grades
table to store grades for
each student and each course. The table would have columns for the
given names and the surname of each student as well as for each course
they have taken, the course name, and the percentage result (shown as
Pctg
). We’d have a different row
for each student for each of their courses:
+------------+---------+---------------------------+------+ | GivenNames | Surname | CourseName | Pctg | +------------+---------+---------------------------+------+ | John Paul | Bloggs | Web Database Applications | 72 | | Sarah | Doe | Programming 1 | 87 | | John Paul | Bloggs | Computing Mathematics | 43 | | John Paul | Bloggs | Computing Mathematics | 65 | | Sarah | Doe | Web Database Applications | 65 | | Susan | Smith | Computing Mathematics | 75 | | Susan | Smith | Programming 1 | 55 | | Susan | Smith | Computing Mathematics | 80 | +------------+---------+---------------------------+------+
This is nice and compact, and we can easily access grades for any student or any course. However, we could have more than one student called Susan Smith; in the sample data, there are two entries for Susan Smith and the Computing Mathematics course. Which Susan Smith got an 80? A common way to differentiate duplicate data entries is to assign a unique number to each entry. Here, we can assign a unique Student ID number to each student:
+------------+------------+---------+---------------------------+------+ | StudentID | GivenNames | Surname | CourseName | Pctg | +------------+------------+---------+---------------------------+------+ | 12345678 | John Paul | Bloggs | Web Database Applications | 72 | | 12345121 | Sarah | Doe | Programming 1 | 87 | | 12345678 | John Paul | Bloggs | Computing Mathematics | 43 | | 12345678 | John Paul | Bloggs | Computing Mathematics | 65 | | 12345121 | Sarah | Doe | Web Database Applications | 65 | | 12345876 | Susan | Smith | Computing Mathematics | 75 | | 12345876 | Susan | Smith | Programming 1 | 55 | | 12345303 | Susan | Smith | Computing Mathematics | 80 | +------------+------------+---------+---------------------------+------+
So, the Susan Smith who got 80 is the one with the Student ID number 12345303.
There’s another problem. In our table, John Paul Bloggs has
failed the Computing Mathematics course once with 43 percent, and
passed it with 65 percent in his second attempt. In a relational
database, the rows form a set, and there is no implicit ordering
between them; you might guess that the pass happened after the fail,
but you can’t actually be sure. There’s no guarantee that the newer
grade will appear after the older one, so we need to add information
about when each grade was awarded, say by adding
a year and semester (Sem
):
+------------+------------+---------+---------------------------+------+-----+------+ | StudentID | GivenNames | Surname | CourseName | Year | Sem | Pctg | +------------+------------+---------+---------------------------+------+-----+------+ | 12345678 | John Paul | Bloggs | Web Database Applications | 2004 | 2 | 72 | | 12345121 | Sarah | Doe | Programming 1 | 2006 | 1 | 87 | | 12345678 | John Paul | Bloggs | Computing Mathematics | 2005 | 2 | 43 | | 12345678 | John Paul | Bloggs | Computing Mathematics | 2006 | 1 | 65 | | 12345121 | Sarah | Doe | Web Database Applications | 2006 | 1 | 65 | | 12345876 | Susan | Smith | Computing Mathematics | 2005 | 1 | 75 | | 12345876 | Susan | Smith | Programming 1 | 2005 | 2 | 55 | | 12345303 | Susan | Smith | Computing Mathematics | 2006 | 1 | 80 | +------------+------------+---------+---------------------------+------+-----+------+
Notice that the Student_Grades
table has become a bit
bloated: the student ID, given names, and surname are repeated for
every grade. We could split up the information and create a Student_Details
table:
+------------+------------+---------+ | StudentID | GivenNames | Surname | +------------+------------+---------+ | 12345121 | Sarah | Doe | | 12345303 | Susan | Smith | | 12345678 | John Paul | Bloggs | | 12345876 | Susan | Smith | +------------+------------+---------+
and keep less information in
the Student_Grades
table:
+------------+---------------------------+------+-----+------+ | StudentID | CourseName | Year | Sem | Pctg | +------------+---------------------------+------+-----+------+ | 12345678 | Web Database Applications | 2004 | 2 | 72 | | 12345121 | Programming 1 | 2006 | 1 | 87 | | 12345678 | Computing Mathematics | 2005 | 2 | 43 | | 12345678 | Computing Mathematics | 2006 | 1 | 65 | | 12345121 | Web Database Applications | 2006 | 1 | 65 | | 12345876 | Computing Mathematics | 2005 | 1 | 75 | | 12345876 | Programming 1 | 2005 | 2 | 55 | | 12345303 | Computing Mathematics | 2006 | 1 | 80 | +------------+---------------------------+------+-----+------+
To
look up a student’s grades, we’d need to first look up her Student ID
from the Student_Details
table and
then read the grades for that Student ID from the Student_Grades
table.
There are still issues we haven’t considered. For example, should we keep information on a student’s enrollment date, postal and email addresses, fees, or attendance? Should we store different types of postal address? How should we store addresses so that things don’t break when a student changes his address?
Implementing a database in this way is problematic; we keep running into things we hadn’t thought about and have to keep changing our database structure. Clearly, we can save a lot of reworking by carefully documenting the requirements and then working through them to develop a coherent design.
Get Learning MySQL now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.