In order to be able to add and manipulate data, you first have to create a database. There’s not much to this. You’re creating just a container in which you will add tables. Creating a table is more involved and offers many choices. There are several types of tables from which to choose, some with unique features. When creating tables, you must also decide on the structure of each table: the number of columns, the type of data each column may hold, how the tables will be indexed, and several other factors. However, while you’re still learning, you can accept the default setting for most of the options when creating tables.
There are a few basic things to decide when creating a structure for your data:
The number of tables to include in your database, as well as the table names
For each table, the number of columns it should contain, as well as the column names
For each column, what kind of data is to be stored
For the last part, in the beginning, we’ll use just four types of columns: columns that contain only numbers; columns that contain alphanumeric characters, but not too many (i.e., a maximum of 255 characters); columns that contain plenty of text and maybe binary files; and columns for recording date and time information. This is a good starting point for creating a database and tables. As we get further along, we can expand that list of column data types to improve the performance of your databases.
This chapter contains examples of how to create a database and tables. The text is written on the assumption that you will enter the SQL statements shown on your server, using the mysql client. The exercises at the end of this chapter will require that you make some changes and additions to the database and its tables on your computer. So, when instructed, be sure to try all of the examples on your computer.
The database and the tables that we create in this chapter will be used in several chapters in this book, especially in Part III. In those later chapters, you will be asked to add, retrieve, and change data from the tables you create in this chapter. Exercises in subsequent chapters assume that you have created the tables you are asked to create in this chapter. Thus, in order to get the most value possible from this book, it’s important that you complete the exercises included for each chapter. It will help reinforce what you read, and you will learn more.
Creating a database is simple, mostly because there’s nothing much
to it. Use the SQL statement
CREATE DATABASE. You will have to
provide a name for the database with this SQL statement. You could call it
something bland like
db1. However, let’s do something more
realistic and interesting. I’m a fan of birds, so I’ve used a database of
a fictitious bird-watching website for the examples in this book. Some
birds live in groups, or a colony called a rookery.
To start, let’s create a database that will contain information about
birds and call it
rookery. To do this, enter the following
from within the mysql client:
CREATE DATABASE rookery;
As previously mentioned, this very minimal, first SQL statement will
create a subdirectory called
rookery on the filesystem in the
data directory for MySQL. It won’t create any data. It will just set up a
place to add tables, which will in turn hold data. Incidentally, if you
don’t like the keyword
DATABASE, you can use
. The results are the
You can, though, do a bit more than the SQL statement shown here for
creating a database. You can add a couple of options in which you can set
the default types of characters that will be used in the database and how
data will be sorted or collated. So, let’s drop the
rookery database and create
it again like so:
The first line in this SQL statement is the same as the earlier one—remember, all of this is one SQL statement spread over two lines, ending with the semicolon. The second line, which is new, tells MySQL that the default characters that will be used in tables in the database are Latin letters and other characters. The third line tells MySQL that the default method of sorting data in tables is based on binary Latin characters. We’ll discuss binary characters and binary sorting in a later chapter, but it’s not necessary to understand that at this point. In fact, for most purposes, the minimal method of creating a database without options, as shown earlier, is fine. You can always change these two options later if necessary. I’m only mentioning the options here so that you know they exist if you need to set them one day.
SHOW DATABASES;+--------------------+ | Database | +--------------------+ | information_schema | | rookery | | mysql | | test | +--------------------+
The results here show the
rookery database, and three
other databases that were created when MySQL was installed on the server.
We saw the other three in Starting to Explore Databases,
and we’ll cover them in later chapters of this book as needed.
Before beginning to add tables to the
enter the following command into the mysql client:
This little command will set the new database that was just created as the default database for the mysql client. It will remain the default database until you change it to a different one or until you exit the client. This makes it easier when entering SQL statements to create tables or other SQL statements related to tables. Otherwise, when you enter each table-related SQL statement, you would have to specify each time the database where the table is located.
The next step for structuring a database is to create tables. Although this can be complicated, we’ll keep it simple to start. We’ll initially create one main table and two smaller tables for reference information. The main table will have a bunch of columns, but the reference tables will have only a few columns.
For our fictitious bird-watchers site, the key interest is birds. So we want to create a table that will hold basic data on birds. For learning purposes, we won’t make this an elaborate table. Enter the following SQL statement into mysql on your computer:
This SQL statement creates the table
birds with five
fields, or columns, with commas separating the information about each
column. Note that all the columns together are contained in a pair of
parentheses. For each colum, we specify the name, the type, and optional
settings. For instance, the information we give about the first column
INT (meaning it has to contain
The names of the columns can be anything other than words that are reserved for SQL statements, clauses, and functions. Actually, you can use a reserve word, but it must always be given within quotes to distinguish it. You can find a list of data types from which to choose on the websites of MySQL and MariaDB, or in my book, MySQL in a Nutshell.
We created this table with only five columns. You can have plenty of columns (up to 255), but you shouldn’t have too many. If a table has too many columns, it can be cumbersome to use and the table will be sluggish when it’s accessed. It’s better to break data into multiple tables.
The first column in the
birds table is a simple
bird_id. It will be the primary key column on which data will be indexed—hence the
PRIMARY KEY. We’ll discuss the importance of the
primary key later.
The next column will contain the scientific name of each bird (e.g.,
Charadrius vociferus, instead of
Killdeer). You might think that the
scientific_name column would be the ideal identifier to use
as the primary key on which to index the
birds table, and
that we wouldn’t need the
bird_id column. But the scientific
name can be very long and usually in Latin or Greek (or sometimes a mix of
both languages), and not everyone is comfortable using words from these
languages. In addition, would be awkward to enter the scientific name of a
bird when referencing a row in the table. We’ve set the
scientific_name column to have a variable-width character
data type (
VARCHAR). The 255 that we specify in the
parentheses after it sets the maximum size (255 should be sufficient for
the long names we’ll need to accommodate).
If the scientific name of a bird has fewer than 255 characters, the
storage engine will reduce the size of the column for the row. This is
different from the
CHAR column data type. If the data in a
CHAR column is less
than its maximum, space is still allocated for the full width that you
set. There are trade-offs with these two basic character data types. If
the storage engine knows exactly what to expect from a column, tables run
faster and can be indexed more easily with a
VARCHAR column can use less space on the server’s
hard drive and is less prone to fragmentation. That can improve
performance. When you know for sure that a column will have a set number
of characters, use
CHAR. When the width may vary, use
Next, we set the column data type for the
of each bird to a variable-width character column of only 50 characters at
The fourth column (
family_id) will be used as
identification numbers for the family of birds to which each bird belongs.
They are integer data types (i.e.,
INT). We’ll create another table for more information on the families.
Then, when manipulating data, we can join the two tables, use a number to
identify each family, and link each bird to its family.
The last column is for the description of each bird. It’s a
TEXT data type, which means that it’s a variable-width column, and it can
hold up 65,535 bytes of data for each row. This will allow us to enter
plenty of text about each bird. We could write multiple pages describing a
bird and put it in this column.
There are additional factors to consider when searching for a bird in a database, so there are many columns we could add to this table: information about migratory patterns, notable features for spotting them in the wild, and so on. In addition, there are many other data types that may be used for columns. We can have columns that allow for larger and smaller numbers, or for binary files to be included in each row. For instance, you might want a column with a binary data type to store a photograph of each bird. However, this basic table gives you a good sampling of the possibilities when creating tables.
To see how the table looks, use the
DESCRIBE statement. It displays
information about the columns of a table, or the table schema—not the data
itself. To use this SQL statement to get information on the table we just
created, you would enter the following SQL statement:
DESCRIBE birds;+-----------------+--------------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +-----------------+--------------+------+-----+---------+----------------+ | bird_id | int(11) | NO | PRI | NULL | auto_increment | | scientific_name | varchar(255) | YES | UNI | NULL | | | common_name | varchar(50) | YES | | NULL | | | family_id | int(11) | YES | | NULL | | | description | text | YES | | NULL | | +-----------------+--------------+------+-----+---------+----------------+
Notice that these results are displayed in a table format made with ASCII characters. It’s not very slick looking, but it’s clean, quick, and provides the information requested. Let’s study this layout, not the content, per se.
The first row of this results set contains column headings describing the rows of information that follow it. In the first column of this results set, Field contains the fields or columns of the table created.
The second column, Type, lists the data type
for each field. Notice that for the table’s columns in which we specified
the data type
VARCHAR with the specific widths within
parentheses, those settings are shown here (e.g.,
varchar(255)). Where we didn’t specify the size for the
INT columns, the defaults were assumed and are shown here.
We’ll cover later what
INT(11) means and discuss the other
possibilities for integer data types.
The third column in the preceding results, Null, indicates whether each field may contain NULL values. NULL is nothing; it’s nonexistent data. This is different from blank or empty content in a field. That may seem strange: just accept that there’s a difference at this point. You’ll see that in action later in this book.
The fourth column, Key, indicates whether a
field is a key field—an indexed column. It’s not an indexed column if the
result is blank, as it is with
common_name. If a column is
indexed, the display will say which kind of index. Because of the limited
space permitted in the display, it truncates the words. In the example
bird_id column is a primary key, shortened to
PRI in this display. We set
another type of key or index, one called
UNIQUE, which is
The next-to-last column in the display,
Default, would contain any default value set for each
field. We didn’t set any when creating the
birds table, but
we could have done so. We can do that later.
The last column, Extra, provides any extra
information the table maintains on each column. In the example shown, we
can see that the values for
bird_id will be incremented
automatically. There’s usually nothing else listed in this column.
If we don’t like something within the structure of the table we
created, we can use the
ALTER TABLE statement to change it
(this SQL statement is covered in Chapter 5).
If you made some mistakes and just want to start over, you can delete the
table and try again to create it. To delete a table completely (including
its data), you can use the
DROP TABLE statement, followed by
the table name. Be careful with this SQL statement, as it’s not reversible
and it deletes any data in the table.
Incidentally, when using the mysql client, you can press the up arrow on
your keyboard to get to the previous lines you entered. So if you create
a table, then run the
DESCRIBE statement and catch a
mistake, you can just drop the table, and use the up arrow to go back to
your previous entry in which you created the table. Use the left arrow
to move the cursor over to the text you want to change and fix it. When
you’ve finished modifying the
CREATE TABLE statement, press
Enter. The modified
CREATE TABLE statement will then be
sent to the server.
Those were a lot of details to absorb in the last section. Let’s take a break from creating
tables and enter data in the
birds table. We’ll use an
INSERT statement, which was covered briefly in Chapter 3, and will be covered in more detail in
the next section. For now, don’t worry too much about understanding all of
the possibilities with the
INSERT statement. Just enter the
following on your server using the mysql client:
'Great Northern Loon'
' White-breasted Nuthatch'
'North Island Brown Kiwi'
This will create six rows of data for six birds. Enter the following from the mysql client to see the contents of the table:
SELECT * FROM birds;+---------+----------------------+-------------------+-----------+-------------+ | bird_id | scientific_name | common_name | family_id | description | +---------+----------------------+-------------------+-----------+-------------+ | 1 | Charadrius vociferus | Killdeer | NULL | NULL | | 2 | Gavia immer | Great Northern... | NULL | NULL | | 3 | Aix sponsa | Wood Duck | NULL | NULL | | 4 | Chordeiles minor | Common Nighthawk | NULL | NULL | | 5 | Sitta carolinensis | White-breasted... | NULL | NULL | | 6 | Apteryx mantelli | North Island... | NULL | NULL | +---------+----------------------+-------------------+-----------+-------------+
Let’s create another table for a different database. We have
information on birds in the
rookery database. Let’s create
another database that contains information about people who are interested
in bird-watching. We’ll call it
birdwatchers and we’ll create
one table for it that we’ll call
humans, to correlate with
the name of
This isn’t much of a table; we’re not collecting much information on members, but it will do well for now. Let’s enter some data into this table. The following adds four people to our table of members of the site:
This enters information for four humans. Notice that we left the first column NULL so that MySQL can assign an identification number automatically and incrementally.
DESCRIBE statement, there’s another way to look at how a table is structured. You can
SHOW CREATE TABLE statement. This basically shows how you might enter the
CREATE TABLE to create an existing table,
perhaps in a different database. What’s particularly interesting and
useful about the
SHOW CREATE TABLE statement is that it shows
the default settings assumed by the server, ones that you might not have
specified when you ran the
CREATE TABLE statement. Here’s how
you would enter this statement, with the results shown after it:
SHOW CREATE TABLE birds \G*************************** 1. row *************************** Table: birds Create Table: CREATE TABLE `birds` ( `bird_id` int(11) NOT NULL AUTO_INCREMENT, `scientific_name` varchar(255) COLLATE latin1_bin DEFAULT NULL, `common_name` varchar(50) COLLATE latin1_bin DEFAULT NULL, `family_id` int(11) DEFAULT NULL, `description` text COLLATE latin1_bin, PRIMARY KEY (`bird_id`), UNIQUE KEY `scientific_name` (`scientific_name`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1 COLLATE=latin1_bin
As mentioned earlier, there are more options that you can set for
each column; if you don’t specify them, the server will use the default
choices. Here you can see those default settings. Notice that we did not
set a default value for any of the fields (except the first one when we
said to use an automatically incremented number), so it set each column to
a default of NULL. For the third column, the
column, the server set the set of characters (i.e., the alphabet, numbers,
and other characters) by which it will collate the data in that column to
latin1_bin (i.e., Latin binary characters). The
server did the same for three other columns. That’s because of how we set
the database at the beginning of this chapter, in the second
CREATE DATABASE statement. This is where
that comes into play. We could set a column to a different one from the
one we set for the database default, but it’s usually not
You may have noticed in looking at the results that the options for
bird_id column don’t indicate that it’s a primary key,
although we specified that in
CREATE TABLE. Instead, the list
of columns is followed by a list of keys or indexes used in the table.
Here it lists the primary key and specifies that that index is based on
bird_id. It then shows a unique key. For that kind of key, it
gives a name of the index,
scientific_name, which is the same
as the column it indexes, and it then shows in parentheses a lists of
columns from which the index is drawn. That could be more than one column,
but it’s just one here. We’ll cover indexes in Chapter 5 (see Indexes).
There’s one more aspect you should note in the results of
CREATE TABLE. Notice that the last line shows a few other settings
after the closing parentheses for the set of columns. First is the type of
table used, or rather the type of storage engine used for this table. In
this case, it’s MyISAM, which is the default for
many servers. The default for your server may be different. Data is stored
and handled in different ways by different storage engines. There are
advantages and disadvantages to each.
The other two settings are the default character set
latin1) and the default collation (
in the table. These come from the default values when the database was
created, or rather they came indirectly from there. You can set a
different character and collation, and you can even set a different
character set and collation for an individual column.
Let me give you an example where setting explicit values for the
character set and collation might be useful. Suppose you have a typical
database for a bird-watcher group located in England with most of its
common names written in English. Suppose further that the site attracts
bird-watchers from other countries in Europe, so you might want to include
common bird names in other languages. Let’s say that you want to set up a
table for the Turkish bird-watchers. For that table, you would use a
different character set and collation, because the Turkish alphabet
contains both Latin and other letters. For the character set, you would
latin5, which has both Latin and other letters. For
collation, you would use
latin5_turkish_ci, which orders text
based on the order of the letters in the Turkish alphabet. To make sure
you don’t forget to use this character set and collation when adding
columns to this table later, you could set the
COLLATE for the table to these values.
Before moving on, let me make one more point about the
CREATE TABLE statement: if you want to create a table with plenty
of special settings different from the default, you can use the results of
SHOW CREATE TABLE statement as a starting point for
constructing a more elaborate
CREATE TABLE statement. Mostly
you would use it to see the assumptions that the server made when it
created a table, based on the default settings during installation.
The next table we’ll create for the examples in this book is
bird_families. This will hold information about bird
families, which are groupings of birds. This will tie into the
family_id column in the
birds table. The new
table will save us from having to enter the name and other information
related to each family of birds for each bird in the
We’re creating three columns in the table. The first is the most
interesting for our purposes here. It’s the column that will be indexed
and will be referenced by the
birds table. That sounds like
there is a physical connection or something similar within the
birds table, but that’s not what will happen. Instead, the
connection will be made only when we execute an SQL statement, a query
referencing both tables. With such SQL statements, we’ll join the
bird_families table to the
birds table based on
family_id columns in both. For instance, we would do this
when we want a list of birds along with their corresponding family names,
or maybe when we want to get a list of birds for a particular
Now we can put all the information we want about a family of birds
in one row. When we enter data in the
birds table, we’ll
family_id identification number that will
reference a row of the
bird_families table. This also helps
to ensure consistency of data: there’s less chance of spelling deviations
when you only enter a number and not a Latin name. It also saves space
because you can store information in one row of
and refer to it from hundreds of rows in
birds. We’ll see
soon how this works.
scientific_name column will hold the scientific
name of the family of birds (e.g., Charadriidae). The
third column is basically for the common names of families (e.g.,
Plovers). But people often associate several common
names to a family of birds, as well as vague names for the types of birds
contained in the family. So we’ll just call the column
Let’s next create a table for information about the orders of the
birds. This is a grouping of families of birds. We’ll name it
bird_orders. For this table, let’s try out some of the extra
options mentioned earlier. Enter the following SQL statement:
This SQL statement creates a table named
with four columns to start. The first one,
order_id, is the
key in which rows will be referenced from the
table. This is followed by
scientific_name for the scientific
name of the order of birds, with a data type of
We’re allowing the maximum number of characters for it. It’s more than
we’ll need, but there won’t be many entries in this table and it’s
difficult to guess what what the longest description will be. So we’ll set
it to the maximum allowed for that data type. We’re naming this column
brief_description, as we did in the earlier
Because all three tables that we’ve created so far have similar
names for some of the columns (e.g.,
may cause us a little trouble later if we try to join all of these tables
together. It might seem simpler to use distinct names for these columns in
each of these tables (e.g.,
we can resolve that ambiguity easily when necessary.
In the previous SQL statement, notice that we have a column for an
image to represent the order of birds. We might put a photo of the most
popular bird of the order or a drawing of several birds from the order.
Notice that for this image file, the data type we’re using is a
BLOB. While the name is cute and evocative,
it also stands for binary large object. We can store
an image file, such as a JPEG file, in the column. That’s not always a
good idea. It can make the table large, which can be a problem when
backing up the database. It might be better to store the image files on
the server and then store a file path or URL address in the database,
pointing to where the image file is located. I’ve included a BLOB here,
though, to show it as a possibility.
After the list of columns, we’ve included the default character set
and collation to be used when creating the columns. We’re using UTF-8 (i.e., UCS Transformation Format, 8-bit), because some
of the names may include characters that are not part of the default
latin1 character set. For instance, if
our fictitious bird-watcher site included German words, the column
brief_description would be able to accept the letters with
umlauts over them (i.e., ä). The character set
utf8 allows for such letters.
For a real bird-watching database, both the
bird_orders tables would have
more columns. There would also be several more tables than the few we’re
creating. But for our purposes, these few tables as they are here will be
fine for now.
You have many more possibilities when creating tables. There are options for setting different types of storage engines. We touched on that in this chapter, but there’s much more to that. You can also create some tables with certain storage engines that will allow you to partition the data across different locations on the server’s hard drives. The storage engine can have an impact on the table’s performance. Some options and settings are rarely used, but they’re there for a reason. For now, we’ve covered enough options and possibilities when creating tables.
What we have covered in this chapter may actually be a bit
overwhelming, especially the notion of reference tables like
bird_orders. Their purpose
should become clearer in time. Chapter 5
provides some clarification on tables, and will show you how to alter
them. There are additional examples of inserting and selecting data
interspersed throughout that chapter. Before moving on, make sure to
complete the exercises in the following section. They should help you to
better understand how tables work and are used.
Besides the SQL statements you entered on your MySQL server while reading this chapter, here are a few exercises to further reinforce what you’ve learned about creating databases and tables. In some of these exercises, you will be asked to create tables that will be used in later chapters, so it’s important that you complete the exercises that follow.
DROP TABLE statement to delete the table
bird_orders that we created earlier in this chapter. Look
CREATE TABLE statement that we used to create
that table. Copy or type it into a text editor and make changes to
that SQL statement: change the
TEXT column type. Watch out for extra commas when you
remove columns from the list. When you’re finished, copy that modified
SQL statement into the mysql
monitor on your computer and press Enter to execute it.
If you get an error, look at the error message (which will probably be confusing) and then look at the SQL statement in your text editor. Look where you made changes and see if you have any mistakes. Make sure you have keywords and values in the correct places and there are no typos. Fix any mistakes you find and try running the statement again. Keep trying until you succeed.
I mentioned in this chapter that we might want to store data
related to identifying birds. Instead of putting that data in the
birds table, create a table for that data, which will be
a reference table. Try creating that table with the
TABLE statement. Name it
it three columns: the first column should be named
wing_id with a data type of
CHAR with the
maximum character width set to 2. Make that column the index, as a
UNIQUE key, but not an
enter two-letter codes manually to identify each row of data—a
feasible task because there will be probably only six rows of data in
this table. Name the second column
wing_shape and set its
data type to
CHAR with the maximum character width set to
25. This will be used to describe the type of wings a bird may have
(e.g., tapered wings). The third column should be called
wing_example and make it a
BLOB column for
storing example images of the shapes of wings.
After creating the
birds_wing_shapes table in the
previous exercise, run the
SHOW CREATE TABLE statement
for that table in mysql. Run it
twice: once with the semi-colon at the end of the SQL statement and
another time with
\G to see how the different displays
can be useful given the results.
Copy the results of the second statement, the
TABLE statement it returns. Paste that into a text editor. Then
DROP TABLE statement to delete the table
birds_wing_shapes in mysql.
In your text editor, change a few things in the
TABLE statement you copied. First, change the storage
engine—the value of
ENGINE for the table—to a MyISAM
table, if it’s not already. Next, change the character set and
collation for the table. Set the character set to
and the collation to
Now copy the
CREATE TABLE statement you modified in
your text editor and paste it into the mysql monitor and press
to run it. If you get an error, look at the confusing error message
and then look at the SQL statement in your text editor. Look where you
made changes and see if you have any mistakes. Make sure you have
keywords and values in the correct places and there are no typos. Fix
any mistakes you find and try running the statement again. Keep trying
to fix it until you’re successful. Once you’re successful, run the
DESCRIBE statement for the table to see how it
Create two more tables, similar to
birds_wing_shapes. One table will store information on
the common shapes of bird bodies, and the other will store information
on the shapes of their bills. They will also be used for helping
bird-watchers to identify birds. Call these two tables
birds_body_shapes table, name the first
body_id, set the data type to
CHAR(3), and make it a
UNIQUE key column.
Name the second column
CHAR(25), and the third column
making it a
BLOB column for storing images of the bird
birds_bill_shapes table, create three
bill_example, making it a
BLOB column for storing images of the bird shapes. Create
both tables with the
ENGINE set to a MyISAM, the DEFAULT
utf8, and the
utf8_general_ci. Run the
SHOW CREATE TABLE
statement for each table when you’re finished to check your