Data Definition Language

The DDL is used to define the structure of data containers and objects within the database. The most common of these containers and objects are tables, indexes, and views. As you’ll see, most objects are defined with a variation of the CREATE command, such as CREATE TABLE or CREATE VIEW. The DROP command is used to delete an existing object (and all of the data it might contain). Examples include DROP TABLE or DROP INDEX. Because the command syntax is so different, statements like CREATE TABLE or CREATE INDEX are usually considered to be separate commands, and not variations of a single CREATE command.

In a sense, the DDL commands are similar to C/C++ header files. DDL commands are used to define the structure, names, and types of the data containers within a database, just as a header file typically defines type definitions, structures, classes, and other data structures. DDL commands are typically used to set up and configure a brand new database before data is entered.

Note

DDL commands define the basic structure of the database and are typically run when a new database is created.

DDL commands are often held in a script file, so that the structure of the database can be easily recreated. Sometimes, especially during development, you may need to recreate only part of the database. To help support this, most CREATE commands in SQLite have an optional IF NOT EXISTS clause.

Normally, a CREATE statement will return an error if an object with the requested name already exists. If the optional IF NOT EXISTS clause is present, then this error is suppressed and nothing is done, even if the structure of the existing object and the new object are not compatible. Similarly, most DROP statements allow an optional IF EXISTS clause that silently ignores any request to delete an object that isn’t there.

In the examples that follow, the IF EXISTS and IF NOT EXISTS command variations are not explicitly called out. Please see the SQLite command reference in Appendix C for the full details on the complete syntax supported by SQLite.

Tables

The most common DDL command is CREATE TABLE. No data values can be stored in a database until a table is defined to hold that data. At the bare minimum, the CREATE TABLE command defines the table name, plus the name of each column. Most of the time you’ll want to define a type for each column as well, although types are optional when using SQLite. Optional constraints, conditions, and default values can also be assigned to each column. Table-level constraints can also be assigned, if they involve multiple columns.

In some large RDBMS systems, CREATE TABLE can be quite complex, defining all kinds of storage options and configurations. SQLite’s version of CREATE TABLE is somewhat simpler, but there are still a great many options available. For full explanation of the command, see CREATE TABLE in Appendix C.

The basics

The most basic syntax for CREATE TABLE looks something like this:

CREATE TABLE table_name
(
   column_name  column_type,
   [...]
);

A table name must be provided to identify the new table. After that, there is just a simple list of column names and their types. Table names come from a global namespace of all identifiers, including the names of tables, views, and indexes.

Clear and concise identifier names are important. Like the database design itself, some careful thought should be put into the table name, trying to pin down the precise meaning of the data it contains. Much the same could be said of column names. Table names and column names tend to be referenced frequently in queries, so there is some desire to keep them as brief as possible while still keeping their purpose clear.

Column types

Most databases use strong, static column typing. This means that the elements of a column can only hold values compatible with the column’s defined type. SQLite utilizes a dynamic typing technique known as manifest typing. For each row value, manifest typing records the value’s type along with the value data. This allows nearly any element of any row to hold almost any type of value.

In the strictest sense, SQLite supports only five concrete datatypes. These are known as storage classes, and represent the different ways SQLite might choose to store data on disk. Every value has one of these five native storage classes:

NULL

A NULL is considered its own distinct type. A NULL type does not hold a value. Literal NULLs are represented by the keyword NULL.

Integer

A signed integer number. Integer values are variable length, being 1, 2, 3, 4, 6, or 8 bytes in length, depending on the minimum size required to hold the specific value. Integer have a range of −9,223,372,036,854,775,808 to +9,223,372,036,854,775,807, or roughly 19 digits. Literal integers are represented by any bare series of numeric digits (without commas) that does not include a decimal point or exponent.

Float

A floating-point number, stored as an 8-byte value in the processor’s native format. For nearly all modern processors, this is an IEEE 754 double-precision number. Literal floating-point numbers are represented by any bare series of numeric digits that include a decimal point or exponent.

Text

A variable-length string, stored using the database encoding (UTF-8, UTF-16BE, or UTF-16LE). Literal text values are represented using character strings in single quotes.

BLOB

A length of raw bytes, copied exactly as provided. Literal BLOBs are represented as hexadecimal text strings preceded by an x. For example, the notation x'1234ABCD' represents a 4-byte BLOB. BLOB stands for Binary Large OBject.

SQLite text and BLOB values are always variable length. The maximum size of a text or BLOB value is limited by a compile-time directive. The default limit is exactly one billion bytes, or slightly less than a full gigabyte. The maximum value for this directive is two gigabytes.

Since the elements of most columns can hold any value type, the “type” of a column is a somewhat misleading concept. Rather than being an absolute type, as in most databases, an SQLite column type (as defined in CREATE TABLE) becomes more of a suggestion than a hard and fast rule. This is known as a type affinity, and essentially represents a desired category of type. Each type affinity has specific rules about what types of values it can store, and how different values will be converted when stored in that column. Generally, a type affinity will cause a conversion or migration of types only if it can be done without losing data or precision.

Each table column must have one of five type affinities:

Text

A column with a text affinity will only store values of type NULL, text, or BLOB. If you attempt to store a value with a numeric type (float or integer) it will be converted into a text representation before being stored as a text value type.

Numeric

A column with a numeric affinity will store any of the five types. Values with integer and float types, along with NULL and BLOB types, are stored without conversion. Any time a value with a text type is stored, an attempt is made to convert the value to a numeric type (integer or float). Assuming the conversion works, the value is stored in an appropriate numeric type. If the conversion fails, the text value is stored without any type of conversion.

Integer

A column with an integer affinity works essentially the same as a numeric affinity. The only difference is that any value with a float type that lacks a fractional component will be converted into an integer type.

Float

A column with a floating-point affinity also works essentially the same as a numeric affinity. The only difference is that most values with integer types are converted into floating-point values and stored as a float type.

None

A column with a none affinity has no preference over storage class. Each value is stored as the type provided, with no attempt to convert anything.

Since type affinities are not part of the SQL standard, SQLite has a series of rules that attempt to map traditional column types to the most logical type affinity. The type affinity of a column is determined by the declared type of the column, according to the following rules (substring matches are case-insensitive):

  1. If no column type was given, then the column is given the none affinity.

  2. If the column type contains the substring “INT,” then the column is given the integer affinity.

  3. If the column type contains any of the substrings “CHAR,” “CLOB,” or “TEXT,” then the column is given the text affinity.

  4. If the column type contains the substring “BLOB,” then the column is given the none affinity.

  5. If the column type contains any of the substrings “REAL,” “FLOA,” or “DOUB,” then it is given the float affinity.

  6. If no match is found, the column is assigned the numeric affinity.

As implied by the first rule, the column type is completely optional. SQLite will allow you to create a table by simply naming the columns, such as CREATE TABLE t ( i, j, k);. You’ll also notice that there isn’t any specific list of column types that are recognized. You can use any column type you want, even making up your own names.

This might sound a bit fast and loose for a typing system, but it works out quite well. By keying off specific substrings, rather than trying to define specific types, SQLite is able to handle SQL statements (and their database-specific types) from just about any database, all while doing a pretty good job of mapping the types to an appropriate affinity. About the only type you need to be careful of is “floating point.” The “int” in “point” will trigger rule 2 before the “floa” in “floating” will get to rule 5, and the column affinity will end up being integer.

Column constraints

In addition to column names and types, a table definition can also impose constraints on specific columns or sets of columns. A more complete view of the CREATE TABLE command looks something like this:

CREATE TABLE table_name
(
   column_name  column_type    column_constraints...,
   [... ,]

   table_constraints,
   [...]
);

Here we see that each individual column can have additional, optional constraints and modifiers placed on it. Column constraints only affect the column for which they are defined, while table constraints can be used to define a constraint across one or more columns. Constraints that involve two or more columns must be defined as table constraints.

Column constraints can be used to define a custom sort order (COLLATE collation_name). Collations determine how text values are sorted. In addition to user-defined collations, SQLite includes a NOCASE collation that ignores case when sorting.

A default value (DEFAULT value) can also be assigned. Nearly all columns have a default of NULL. You can use the DEFAULT column constraint to set a different default value. The default can either be a literal or an expression. Expressions must be placed in parentheses.

To help with date and time defaults, SQLite also includes three special keywords that may be used as a default value: CURRENT_TIME, CURRENT_DATE, and CURRENT_TIMESTAMP. These will record the UTC time, date, or date and time, respectively, when a new row is inserted. See Date and Time Features for more information on date and time functions.

Column constraints can also impose limits on a column, like denying NULL (NOT NULL) or requiring a unique value for each row (UNIQUE). Remember that a NULL is not considered a value, so UNIQUE does not imply NOT NULL, nor does UNIQUE imply only a single NULL is allowed. If you want to keep NULL assignments out of a UNIQUE column, you need to explicitly mark the column NOT NULL.

Column values can also be subjected to arbitrary user-defined constraints before they are assigned (CHECK ( expression )). Some types of constraints also allow you to specify the action to be taken in situations when the constraint would be violated. See CREATE TABLE and UPDATE in Appendix C for more specific details.

When multiple column constraints are used on a single column, the constraints are listed one after another without commas. Some examples:

CREATE TABLE parts
(
    part_id    INTEGER   PRIMARY KEY,
    stock      INTEGER   DEFAULT 0   NOT NULL,
    desc       TEXT      CHECK( desc != '' )    -- empty strings not allowed
);

In order to enforce a UNIQUE column constraint, a unique index will be automatically created over that column. A different index will be created for each column (or set of columns) marked UNIQUE. There is some expense in maintaining an index, so be aware that enforcing a UNIQUE column constraint can have performance considerations.

Primary keys

In addition to these other constraints, a single column (or set of columns) can be designated as the PRIMARY KEY. Each table can have only one primary key. Primary keys must be unique, so designating a column as PRIMARY KEY implies the UNIQUE constraint as well, and will result in an automatic unique index being created. If a column is marked both UNIQUE and PRIMARY KEY, only one index will be created.

In SQLite, PRIMARY KEY does not imply NOT NULL. This is in contradiction to the SQL standard and is considered a bug, but the behavior is so long-standing that there are concerns about fixing it and breaking existing applications. As a result, it is always a good idea to explicitly mark at least one column from each PRIMARY KEY as NOT NULL.

There are also some good design reasons for defining a primary key, which will be discussed in Tables and Keys, but the only significant, concrete thing that comes out of defining a PRIMARY KEY is the automatic unique index. There are also some minor syntax shortcuts.

If, however, the primary key column has a type that is designated as INTEGER (and very specifically INTEGER), then that column becomes the table’s “root” column.

SQLite must have some column that can be used to index the base storage for the table. In a sense, that column acts as the master index that is used to store the table itself. Like many other database systems, SQLite will silently create a hidden ROWID column for this purpose. Different database systems use different names so, in an effort to maintain compatibility, SQLite will recognize the names ROWID, OID, or _ROWID_ to reference the root column. Normally ROWID columns are not returned (even for column wildcards), nor are their values included in dump files.

If a table includes an INTEGER PRIMARY KEY column, then that column becomes an alias for the automatic ROWID column. You can still reference the column by any of the ROWID names, but you can also reference the column by its “real” user-defined name. Unlike PRIMARY KEY by itself, INTEGER PRIMARY KEY columns do have an automatic NOT NULL constraint associated with them. They are also strictly typed to only accept integer values.

There are two significant advantages of INTEGER PRIMARY KEY columns. First, because the column aliases the table’s root ROWID column, there is no need for a secondary index. The table itself acts as the index, providing efficient lookups without the maintenance costs of an external index.

Second, INTEGER PRIMARY KEY columns can automatically provide unique default values. When you insert a row without an explicit value for the ROWID (or ROWID alias) column, SQLite will automatically choose a value that is one greater than the largest existing value in the column. This provides an easy means to automatically generate unique keys. If the maximum value is reached, the database will randomly try other values, looking for an unused key.

INTEGER PRIMARY KEY columns can optionally be marked as AUTOINCREMENT. In that case, the automatically generated ID values will constantly increase, preventing the reuse of an ID value from a previously deleted row. If the maximum value is reached, insertions with automatic INTEGER PRIMARY KEY values are no longer possible. This is unlikely, however, as the INTEGER PRIMARY KEY type domain is large enough to allow 1,000 inserts per second for almost 300 million years.

When using either automatic or AUTOINCREMENT values, it is always possible to insert an explicit ROWID (or ROWID alias) value. Other than the INTEGER PRIMARY KEY designation, SQLite offers no other type of automatic sequence feature.

In addition to a PRIMARY KEY, columns can also be marked as a FOREIGN KEY. These columns reference rows in another (foreign) table. Foreign keys can be used to create links between rows in different tables. See Tables and Keys for details.

Table constraints

Table definitions can also include table-level constraints. In general, table constraints and column constraints work the same way. Table-level constraints still operate on individual rows. The main difference is that using the table constraint syntax, you can apply the constraint to a group of columns rather than just a single column. It is perfectly legal to define a table constraint with only one column, effectively defining a column constraint. Multicolumn constraints are sometimes known as compound constraints.

At the table level, SQLite supports the UNIQUE, CHECK, and PRIMARY KEY constraints. The check constraint is very similar, requiring only an expression (CHECK (expression)). Both the UNIQUE and PRIMARY KEY constraints, when given as a table constraint, require a list of columns (e.g., UNIQUE (column_name, [...]), PRIMARY KEY (column_name, [...])). As with column constraints, any table-level UNIQUE or PRIMARY KEY (which implies UNIQUE) constraint will automatically create a unique index over the appropriate columns.

Table constraints that are applied to multiple columns use the set of columns as a group. For example, when UNIQUE or PRIMARY KEY is applied across more than one column, each individual column is allowed to have duplicate values. The constraint only prevents the set of values across the designated columns from being replicated. If you wanted each individual column to also be UNIQUE, you’d need to add the appropriate constraints to the individual columns.

Consider a table that contains records of all the rooms in a multibuilding campus:

CREATE TABLE rooms
(
    room_number       INTEGER  NOT NULL,
    building_number   INTEGER  NOT NULL,
    [...,]

    PRIMARY KEY( room_number, building_number )
);

Clearly we need to allow for more than one room with the number 101. We also need to allow for more than one room in building 103. But there should only be one room 101 in building 103, so we apply the constraint across both columns. In this example, we’ve chosen to make these columns into a compound primary key, since the building number and room number combine to quintessentially define a specific room. Depending on the design of the rest of the database, it might have been equally valid to define a simple UNIQUE constraint across these two columns, and designated an arbitrary room_id column as the primary key.

Tables from queries

You can also create a table from the output of a query. This is a slightly different CREATE TABLE syntax that creates a new table and preloads it with data, all in one command:

CREATE [TEMP] TABLE table_name AS SELECT query_statement;

Using this form, you do not designate the number of columns or their names or types. Rather, the query statement is run and the output is used to define the column names and preload the new table with data. With this syntax, there is no way to designate column constraints or modifiers. Any result column that is a direct column reference will inherit the column’s affinity, but and all columns are given a NONE affinity. The query statement consists of a SELECT command. More information on SELECT can be found in Chapter 5.

Tables created in this manner are not dynamically updated—the query command is run only once when the table is created. Once the data is entered into the new table, it remains unaltered until you change it. If you need a table-like object that can dynamically update itself, use a VIEW (Views).

This example shows the optional TEMP keyword (the full word TEMPORARY can also be used) in CREATE TEMP TABLE. This modifier can be used on any variation of CREATE TABLE, but is frequently used in conjunction with the ...AS SELECT... variation shown here. Temporary tables have two specific features. First, temporary tables can only be seen by the database connection that created them. This allows the simultaneous re-use of table names without any worry of conflict between different clients. Second, all associated temporary tables are automatically dropped and deleted whenever a database connection is closed.

Generally speaking, CREATE TABLE...AS SELECT is not the best choice for creating standard tables. If you need to copy data from an old table into a new table, a better choice is to use CREATE TABLE to define an empty table with all of the appropriate column modifiers and table constraints. You can then bulk copy all the data into the new table using a variation of the INSERT command that allows for query statements. See INSERT for details.

Altering tables

SQLite supports a limited version of the ALTER TABLE command. Currently, there are only two operations supported by ALTER TABLE: add column and rename. The add column variant allows you to add new columns to an existing table. It cannot remove them. New columns are always added to the end of the column list. Several other restrictions apply.

If you need to make a more significant change while preserving as much data as possible, you can use the rename variant to rename the existing table, create a new table under the original name, and then copy the data from the old table to the new table. The old table can then be safely dropped.

For full details, see ALTER TABLE in Appendix C.

Dropping tables

The CREATE TABLE command is used to create tables and DROP TABLE is used to delete them. The DROP TABLE command deletes a table and all of the data it contains. The table definition is also removed from the database system catalogs.

The DROP TABLE command is very simple. The only argument is the name of the table you wish to drop:

DROP TABLE table_name;

In addition to deleting the table, DROP TABLE will also drop any indexes associated with the table. Both automatically created indexes (such as those used to enforce a UNIQUE constraint) as well as manually created indexes will be dropped.

Virtual tables

Virtual tables can be used to connect any data source to SQLite, including other databases. A virtual table is created with the CREATE VIRTUAL TABLE command. Although very similar to CREATE TABLE, there are important differences. For example, virtual tables cannot be made temporary, nor do they allow for an IF NOT EXISTS clause. To drop a virtual table, you use the normal DROP TABLE command.

For more information on virtual tables, including the full syntax for CREATE VIRTUAL TABLE, see Chapter 10.

Views

Views provide a way to package queries into a predefined object. Once created, views act more or less like read-only tables. Just like tables, new views can be marked as TEMP, with the same result. The basic syntax of the CREATE VIEW command is:

CREATE [TEMP] VIEW view_name AS SELECT query_statement

The CREATE VIEW syntax is almost identical to the CREATE TABLE...AS SELECT command. This is because both commands serve a similar purpose, with one important difference. The result of a CREATE TABLE command is a new table that contains a full copy of the data. The SELECT statement is run exactly once and the output of the query is stored in the newly defined table. Once created, the table will hold its own, independent copy of the data.

A view, on the other hand, is fully dynamic. Every time the view is referenced or queried, the underlying SELECT statement is run to regenerate the view. This means the data seen in a view automatically updates as the data changes in the underlying tables. In a sense, views are almost like named queries.

Views are commonly used in one of two ways. First, they can be used to package up commonly used queries into a more convenient form. This is especially true if the query is complex and prone to error. By creating a view, you can be sure to get the same query each time.

Views are also commonly used to create user-friendly versions of standard tables. A common example are tables with date and time records. Normally, any time or date value is recorded in Coordinated Universal Time, or UTC. UTC is a more proper format for dates and times because it is unambiguous and time-zone independent. Unfortunately, it can also be a bit confusing if you’re several time zones away. It is often useful to create a view that mimics the base table, but converts all the times and dates from UTC into the local time zone. This way the data in the original tables remains unchanged, but the presentation is in units that are more user-friendly.

Views are dropped with the DROP VIEW command:

DROP VIEW view_name;

Dropping a view will not have any effect on the tables it references.

Indexes

Indexes (or indices) are a means to optimize database lookups by pre-sorting and indexing one or more columns of a table. Ideally, this allows specific rows in a table to be found without having to scan every row in the table. In this fashion, indexes can provide a large performance boost to some types of queries. Indexes are not free, however, requiring updates with each modification to a table as well as additional storage space. There are even some situations when an index will cause a drop in performance. See Indexes for more information on when it makes sense to use an index.

The basic syntax for creating an index specifies the name of the new index, as well as the table and column names that are indexed. Indexes are always associated with one (and only one) table, but they can include one or more columns from that table:

CREATE [UNIQUE] INDEX index_name ON table_name ( column_name [, ...] );

Normally indexes allow duplicate values. The optional UNIQUE keyword indicates that duplicate entries are not allowed, and any attempt to insert or update a table with a nonunique value will cause an error. For unique indexes that reference more than one column, all the columns must match for an entry to be considered duplicate. As discussed with CREATE TABLE, NULL isn’t considered a value, so a UNIQUE index will not prevent one or more NULLs. If you want to prevent NULLs, you must indicate NOT NULL in the original table definition.

Each index is tied to a specific table, but they all share a common namespace. Although you can name an index anything you like, it is standard practice to name an index with a standard prefix (such as idx_), and then include the table name and possibly the names of the columns included in the index. For example:

CREATE INDEX idx_employees_name ON employees ( name );

This makes for long index names but, unlike table or view names, you typically only reference an index’s name when you create it and when you drop it.

As with all the DROP commands, the DROP INDEX command is very simple, and requires only the name of the index that is being dropped:

DROP INDEX index_name;

Dropping an index will remove the index from the database, but will leave the associated table intact.

As described earlier, the CREATE TABLE command will automatically create unique indexes to enforce a UNIQUE or PRIMARY KEY constraint. All automatic indexes will start with an sqlite_ prefix. Because these indexes are required to enforce the table definition, they cannot be manually dropped with the DROP INDEX command. Dropping the automatic indexes would alter the table behavior as defined by the original CREATE TABLE command.

Conversely, if you have manually defined a UNIQUE index, dropping that index will allow the database to insert or update redundant data. Be careful when auditing indexes and remember that not all indexes are created for performance reasons.

Get Using SQLite now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.