Chapter 4. Smart Table Design: Why be normal?

image with no caption

You’ve been creating tables without giving much thought to them. And that’s fine, they work. You can SELECT, INSERT, DELETE, and UPDATE with them. But as you get more data, you start seeing things you wish you’d done to make your WHERE clauses simpler. What you need is to make your tables more normal.

Two fishy tables

Jack and Mark both created tables to store information about record-setting fish. Mark’s table has columns for the species and common names of the fish, its weight, and where it was caught. It doesn’t include the names of the people who caught the fish.

image with no caption
image with no caption

Jack’s table has the common name and weight of the fish, but it also contains the first and last names of the people who caught them, and it breaks down the location into a column containing the name of the body of water where the fish was caught, and a separate state column.

image with no caption

Brain Power

SQL is the language used by relational databases. What do you think “relational” means in an SQL database?

A table is all about relationships

SQL is known as a Relational Database Management System, or RDBMS. Don’t bother memorizing it. We only care about the word RELATIONAL[1]. All this means to you is that to design a killer table, you need to consider how the columns relate to each other to describe a thing.

The challenge is to describe the thing using columns in a way that makes getting the information out of it easy. This depends on what you need from the table, but there are some very broad steps you can follow when you’re creating a table.

  1. Pick your thing, the one thing you want your table to describe.

    Note

    What’s the main thing you want your table to be about?

  2. Make a list of the information you need to know about your one thing when you’re using the table.

    Note

    How will you use this table?

  3. Using the list, break down the information about your thing into pieces you can use for organizing your table.

    Note

    How can you most easily query this table?

image with no caption

We could, but we don’t need the data broken down to that level.

At least, not in this case. If Jack had been writing an article about the best places to go on vacation and catch a big fish, then he might have wanted the street number and name so readers could find accommodations nearby.

But Jack only needed location and state, so he only added as many columns as he needed to save space in his database. At that point, he decided his data was broken down enough—it is atomic.

Brain Power

What do you think the word atomic means in terms of SQL data?

Atomic data

What’s an atom? A little piece of information that can’t or shouldn’t be divided. It’s the same for your data. When it’s ATOMIC, that means that it’s been broken down into the smallest pieces of data that can’t or shouldn’t be divided.

30 minutes or it’s free

Consider a pizza delivery guy. To get to where he’s going, he just needs a street number and address in a single column. For his purposes, that’s atomic. He never needs to look for a single street number on its own.

In fact, if his data were broken into street number and street name, his queries would have to be longer and more complicated, making it take him longer to get the pizza to your front door.

image with no caption

Location, location, location

Now consider a realtor. He might want to have a separate column for the street number. He may want to query on a given street to see all the houses for sale by street number. For him, street number and street name are each atomic.

image with no caption

Atomic data and your tables

There are some questions you can ask to help you figure out what you need to put in your tables:

  1. What is the one thing your table describes?

    Does your table describe clowns, cows, doughnuts, people?

    image with no caption
  2. How will you use the table to get at the one thing?

    Note

    Design your table to be easy to query!

    image with no caption
  3. Do your columns contain atomic data to make your queries short and to the point?

    image with no caption

Reasons to be normal

When your data consultancy takes off and you need to hire more SQL database designers, wouldn’t it be great if you didn’t need to waste hours explaining how your tables work?

Well, making your tables NORMAL means they follow some standard rules your new designers will understand. And the good news is, our tables with atomic data are halfway there.

Making your data atomic is the first step in creating a NORMAL table.

The benefits of normal tables

  1. Normal tables won’t have duplicate data, which will reduce the size of your database.

    image with no caption
  2. With less data to search through, your queries will be faster.

    image with no caption

    Because, even when your tables are tiny, it adds up. And tables grow. If you begin with a normalized table, you won’t have to go back and change your table when your queries go too slowly.

Clowns aren’t normal

Remember the clown table? Clown tracking has become a nationwide craze, and our old table isn’t going to cut it because the appearance and activities columns contain so much data. For our purposes, this table is not atomic.

Note

These two columns are really difficult to query because they contain so much data!

clown_info

name

last_seen

appearance

activities

Elsie

Cherry Hill Senior Center

F, red hair, green dress, huge feet

balloons, little car

Pickles

Jack Green’s party

M, orange hair, blue suit, huge feet

mime

Snuggles

Ball-Mart

F, yellow shirt, baggy blue pants

horn, umbrella

Mr. Hobo

Eric Gray’s Party

M, cigar, black hair, tiny hat

violin

Clarabelle

Belmont Senior Center

F, pink hair, huge flower, blue dress

yelling, dancing

Scooter

Oakland Hospital

M, blue hair, red suit, huge nose

balloons

Zippo

Millstone Mall

F, orange suit, baggy pants

singing

Babe

Earl’s Autos

F, all pink and sparkly

balancing, little car

Bonzo

Dickson Park

M, in drag, polka dotted dress

singing, dancing

Sniffles

Tracy’s

M, green and purple suit, pointy nose

climbing into tiny car

Halfway to 1NF

Remember, our table is only about halfway normal when it’s got atomic data in it. When we’re completely normal we’ll be in the FIRST NORMAL FORM or 1NF.

To be 1NF, a table must follow these two rules:

Each row of data must contain atomic values.

Note

We already know how to do this.

Each row of data must have a unique identifier, known as a Primary Key.

Note

To make our tables completely normal, we need to give each record a Primary Key.

Brain Power

What types of columns do you think would make good Primary Keys?

image with no caption

PRIMARY KEY rules

The column in your table that will be your primary key has to be designated as such when you create the table. In a few pages, we’ll create a table and designate a primary key, but before that, let’s take a closer look at what a primary key is.

image with no caption

A primary key is a column in your table that makes each record unique.

image with no caption

The primary key is used to uniquely identify each record

Which means that the data in the primary key column can’t be repeated. Consider a table with the columns shown below. Do you think any of those would make good primary keys?

image with no caption

Watch it!

Take care using SSNs as the Primary Keys for your records.

With identity theft only increasing, people don’t want to give out SSNs—and with good reason. They’re too important to risk. Can you absolutely guarantee that your database is secure? If it’s not, all those SSNs can be stolen, along with your customers’ identities.

image with no caption

A primary key can’t be NULL

If it’s null, it can’t be unique because other records can also be NULL.

image with no caption

The primary key must be given a value when the record is inserted

When you insert a record without a primary key, you run the risk of ending up with a NULL primary key and duplicate rows in your table, which violates First Normal Form.

image with no caption

The primary key must be compact

A primary key should contain only the information it needs to to be unique and nothing extra.

image with no caption

The primary key values can’t be changed

If you could change the value of your key, you’d risk accidentally setting it to a value you already used. Remember, it has to remain unique.

Brain Power

Given all these rules, can you think of a good primary key to use in a table?

Look back through the tables in the book. Do any of them have a column that contains truly unique values?

image with no caption

The best primary key may be a new primary key.

When it comes to creating primary keys, your best bet may be to create a column that contains a unique number. Think of a table with people’s info, but with an additional column containing a number. In the example below, let’s call it ID.

If it weren’t for the ID column, the records for John Brown would be identical. But in this case, they’re actually two different people. The ID column makes these records unique. This table is in first normal form.

image with no caption

Geek Bits

There’s a big debate in the SQL world about using synthetic, or made-up, primary keys (like the ID column above) versus using natural keys—data that is already in the table (like a VIN number on a car or SSN number). We won’t take sides, but we will discuss primary keys in more detail in Chapter 7.

Getting to NORMAL

It’s time to step back and normalize our tables. We need to make our data atomic and add primary keys. Creating a primary key is normally something we do when we write our CREATE TABLE code.

Brain Power

Do you remember how to add columns to an existing table?

Fixing Greg’s table

From what you’ve seen so far, this is how you’d have to fix Greg’s table:

Fixing Greg’s table Step 1: SELECT all of your data and save it somehow.

Fixing Greg’s table Step 2: Create a new normal table.

Fixing Greg’s table Step 3: INSERT all that old data into the new table, changing each row to match the new table structure.

So now you can drop your old table.

image with no caption

So, we know that Greg’s table isn’t perfect.

It’s not atomic and it has no primary key. But luckily for Greg, you don’t have to live with the old table, and you don’t have to dump your data.

We can add a primary key to Greg’s table and make the columns more atomic using just one new command. But first, let’s take a little trip to the past...

The CREATE TABLE we wrote

Greg needs a primary key, and after all the talk about atomic data, he realizes there are a few things he could do to make his columns more atomic. Before we look at how to fix the existing table, let’s look at how we could have created the table in the first place!

Here’s the table we created way back in Chapter 1.

image with no caption

Brain Power

But what if you don’t have your old CREATE TABLE printed anywhere? Can you think of some way to get at the code?

Show me the

What if you use the DESCRIBE my_contacts command to look at the code you used when you set up the table? You’ll see something that looks a lot like this:

image with no caption

But we really want to look at the CREATE code here, not the fields in the table, so we can figure out what we should have done at the very beginning without having to write the CREATE statement over again.

The statement SHOW CREATE_TABLE will return a CREATE TABLE statement that can exactly recreate our table, minus any data in it. This way, you can always see how the table you are looking at could be created. Try it:

SHOW CREATE TABLE my_contacts;

Time-saving command

Take a look at the code we used to create the table in The CREATE TABLE we wrote, and the code below that the SHOW CREATE TABLE my_contacts gives you. They aren’t identical, but if you paste the code below into a CREATE TABLE command, the end result will be the same. You don’t need to remove the backticks or data settings, but it’s neater if you do.

image with no caption

Although you could make the code neater (by removing the last line and backticks), you can just copy and paste it to create a table.

Note

Unless you’ve deleted the original table, you’ll have to give this one a new name.

The CREATE TABLE with a PRIMARY KEY

Here’s the code our SHOW CREATE TABLE my_contacts gave us. We removed the backticks and last line. At the top of the column list we added a contact_id column that we’re setting to NOT NULL, and at the bottom of the list, we’re add a line PRIMARY KEY, which we set to use our new contact_id column as the primary key.

image with no caption

1, 2, 3... auto incrementally

Adding the keyword AUTO_INCREMENT to our contact_id column makes our SQL software automatically fill that column with a value that starts on row 1 with a value of 1 and goes up in increments of 1.

image with no caption
image with no caption

What do you think will happen?

Better yet, try it out for yourself and see what happens.

image with no caption

You won’t have to start over; instead, you can use an ALTER statement.

A table with data in it doesn’t have to be dumped, then dropped, then recreated. We can actually change an existing table. But to do that, we’re going to borrow the ALTER statement and some of its keywords from Chapter 5.

Adding a PRIMARY KEY to an existing table

Here’s the code to add an AUTO_INCREMENT primary key to Greg’s my_contacts table. (It’s a long command, so you’ll need to turn your book.)

image with no caption

Brain Power

Do you think that this will add values to the new contact_id column for records already in the table or only for newly inserted records? How can you check?

ALTER TABLE and add a PRIMARY KEY

Try the code yourself. Open your SQL terminal. USE the gregs_list database, and type in this command:

image with no caption

To see what happened to your table, try a SELECT * from my_contacts;

image with no caption

Will Greg get his phone number column? Turn to Chapter 5 to find out.

Your SQL Toolbox

You’ve got Chapter 4 under your belt. Look at all the new tools you’ve added to your toolbox now! For a complete list of tooltips in the book, see Appendix C.

image with no caption


[1] Some people think that RELATIONAL means multiple tables relating to each other. That’s not correct.

Get Head First SQL now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.