Data Manipulation Language

The Data Manipulation Language is all about getting user data in and out of the database. After all the data structures and other database objects have been created with DDL commands, DML commands can be used to load those data structures full of useful data.

The DML supported by SQLite falls into two basic categories. The first category consists of the “update” commands, which includes the actual UPDATE command, as well as the INSERT and DELETE commands. As you might guess, these commands are used to update (or modify), insert, and delete the rows of a table. All of these commands alter the stored data in some way. The update commands are the primary means of managing all the data within a database.

The second category consists of the “query” commands, which are used to extract data from the database. Actually, there is only one query command: SELECT. The SELECT command not only prints returned values, but provides a great number of options to combine different tables and rows and otherwise manipulate data before returning the final result.

SELECT is, unquestionably, the most complex SQL command. It is also, arguably, the most important SQL command. This chapter will only cover the very basics of SELECT, and then we will spend the next chapter going through all of its parts, bit by bit. To address the full command syntax in detail, SELECT gets a whole chapter to itself (Chapter 5).

Row Modification Commands

There are three commands used for adding, modifying, and removing data from the database. INSERT adds new rows, UPDATE modifies existing rows, and DELETE removes rows. These three commands are used to maintain all of the actual data values within the database. All three update commands operate at a row level, adding, altering, or removing the specified rows. Although all three commands are capable of acting on multiple rows, each command can only directly act upon rows contained within a single table.

INSERT

The INSERT command is used to create new rows in the specified table. There are two meaningful versions of the command. The first version uses a VALUES clause to specify a list of values to insert:

INSERT INTO table_name (column_name [, ...]) VALUES (new_value [, ...]);

A table name is provided, along with a list of columns and a list of values. Both lists must have the same number of items. A single new row is created and each value is recorded into its respective column. The columns can be listed in any order, just as long as the list of columns and the list of values line up correctly. Any columns that are not listed will receive their default values:

INSERT INTO parts ( name, stock, status ) VALUES ( 'Widget', 17, 'IN STOCK' );

In this example, we attempt to insert a new row into a “parts” table. Note the use of single quotes for text literals.

Technically, the list of column names is optional. If no explicit list of columns is provided, the INSERT command will attempt to pair up values with the table’s full list of columns:

INSERT INTO table_name VALUES (new_value [, ...]);

The trick with this format is that the number and order of values must exactly match the number and order of columns in the table definition. That means it is impossible to use default values, even on INTEGER PRIMARY KEY columns. More often than not, this is not actually desirable. This format is also harder to maintain within application source code, since it must be meticulously updated if the table format changes. In general, it is recommended that you always explicitly list out the columns in an INSERT statement.

When bulk importing data, it is common to loop over data sets, calling INSERT over and over. Processing these statements one at a time can be fairly slow, since each command will update both the table and any relevant indexes, and then make sure the data is fully written out to physical disk before (finally!) starting the next INSERT. This is a fairly lengthly process, since it requires physical I/O.

To speed up bulk inserts, it is common to wrap groups of 1,000 to 10,000 INSERT statements into a single transaction. Grouping the statement together will substantially increase the overall speed of the inserts by delaying the physical I/O until the end of the transaction. See Transaction Control Language for more information on transactions.

Note

Bulk inserts can be sped up by wrapping large groups of INSERT commands inside a transaction.

The second version of INSERT allows you to define values by using a query statement. This is very similar to the CREATE TABLE...AS SELECT command, although the table must already exist. This is the only version of INSERT that can insert more than one row with a single command:

INSERT INTO table_name (column_name, [...]) SELECT query_statement;

This type of INSERT is most commonly used to bulk copy data from one table to another. This is a common operation when you need to update the definition of a table, but you don’t want to lose all the data that already exists in the database. The old table is renamed, the new table is defined, and the data is copied from the old table into the new table using an INSERT INTO...SELECT command. This form can also be used to populate temporary tables or copy data from one attached database to another.

As with the VALUES version of INSERT, the column list is technically optional but, for all the same reasons, it is still recommended that you provide an explicit column list.

All versions of the INSERT command also support an optional conflict resolution clause. This conflict clause determines what should be done if the results of the INSERT would violate a database constraint. The most common example is INSERT OR REPLACE, which comes into play when the INSERT would, as executed, cause a UNIQUE constraint violation. If the REPLACE conflict resolution is present, any existing row that would cause a UNIQUE constraint violation is first deleted, and then the INSERT is allowed to continue. This specific usage pattern is so common that the whole INSERT OR REPLACE phrase can be replaced by just REPLACE. For example, REPLACE INTO table_name....

See INSERT and UPDATE in Appendix C for more information on the details of conflict resolution.

UPDATE

The UPDATE command is used to assign new values to one or more columns of existing rows in a table. The command can update more than one row, but all of the rows must be part of the same table. The basic syntax is:

UPDATE table_name SET column_name=new_value [, ...] WHERE expression

The command requires a table name followed by a list of column name/value pairs that should be assigned. Which rows are updated is determined by a conditional expression that is tested against each row of the table. The most common usage pattern uses the expression to check for equality on some unique column, such as a PRIMARY KEY column.

Caution

If no WHERE condition is given, the UPDATE command will attempt to update the designated columns in every row of a table.

It is not considered an error if the WHERE expression evaluates to false for every row in the table, resulting in no actual updates.

Here is a more specific example:

-- Update the price and stock of part_id 454:
UPDATE parts SET price = 4.25, stock = 75 WHERE part_id = 454;

This example assumes that the table parts has at least three columns: price, stock, and part_id. The database will find each row with a part_id of 454. In this case, it can be assumed that part_id is a PRIMARY KEY column, so only one row will be updated. The price and stock columns of that row are then assigned new values.

The full syntax for UPDATE can be found at UPDATE in Appendix C.

DELETE

As you might guess, the DELETE command is used to delete or remove one or more rows from a single table. The rows are completely deleted from the table:

DELETE FROM table_name WHERE expression;

The command requires only a table name and a conditional expression to pick out rows. The WHERE expression is used to select specific rows to delete, just as it is used in the UPDATE command.

Caution

If no WHERE condition is given, the DELETE command will attempt to delete every row of a table.

As with UPDATE, it is not considered an error if the WHERE expression evaluates to false for every row in the table, resulting in no actual deletions.

Some specific examples:

-- Delete the row with rowid 385:
DELETE FROM parts WHERE part_id = 385;

-- Delete all rows with a rowid greater than or equal to 43
-- and less than or equal to 246:
DELETE FROM parts WHERE part_id >= 43 AND part_id <= 246;

These examples assume we have a table named parts that contains at least one unique column named part_id.

As noted, if no WHERE clause is given, the DELETE command will attempt to delete every row in a table. SQLite optimizes this specific case, truncating the full table, rather than processing each individual row. Truncating the table is much faster than deleting each individual row, but truncation bypasses the individual row processing. If you wish to process each row as it is deleted, provide a WHERE clause that always evaluates to true:

DELETE FROM parts WHERE 1; -- delete all rows, force per-row processing

The existence of the WHERE clause will prevent the truncation, allowing each row to be processed in turn.

The Query Command

The final DML command to cover is the SELECT command. SELECT is used to extract or return values from the database. Almost any time you want to extract or return some kind of value, you’ll need to use the SELECT command. Generally, the returned values are derived from the contents of the database, but SELECT can also be used to return the value of simple expressions. This is a great way to test out expressions, for example:

sqlite> SELECT 1+1, 5*32, 'abc'||'def', 1>2;
1+1         5*32        'abc' || 'def'  1>2       
----------  ----------  --------------  ----------
2           160         abcdef          0         

SELECT is a read-only command, and will not modify the database (unless the SELECT is embedded in a different command, such as a CREATE TABLE...AS SELECT or an INSERT INTO...SELECT).

Without question, SELECT is the most complex SQL command, both in terms of syntax as well as function. The SELECT syntax tries to represent a generic framework that is capable of expressing a great many different types of queries. While it is somewhat successful at this, there are areas where SELECT has traded away simplicity for more flexibility. As a result, SELECT has a large number of optional clauses, each with its own set of options and formats.

Understanding how to mix and match these optional clauses to get the result you’re looking for can take some time. While the most basic syntax can be shown with a good set of examples, to really wrap your head around SELECT, it is best to understand how it actually works and what it is trying to accomplish.

Because SELECT can be so complex, and because SELECT is an extremely important command, we will spend the whole next chapter looking very closely at SELECT and each of its clauses. There will be some discussion about what is going on behind the scenes, to provide more insight into how to read and write complex queries.

For now, we’ll just give you a taste. That should provide enough information to play around with the other commands in this chapter. The most basic form of SELECT is:

SELECT output_list FROM input_table WHERE row_filter;

The output list is a list of expressions that should be evaluated and returned for each resulting row. Most commonly, this is simply a list of columns. The output list can also include a wildcard (*) that indicates all known columns should be returned.

The FROM clause defines the source of the table data. The next chapter will show how tables can be linked and joined, but for now we’ll stick with querying one table at a time.

The WHERE clause is a conditional filtering expression that is applied to each row. It is essentially the same as the WHERE clause in the UPDATE and DELETE commands. Those rows that evaluate to true will be part of the result, while the other rows will be filtered out.

Consider this table:

sqlite> CREATE TABLE tbl ( a, b, c, id INTEGER PRIMARY KEY );
sqlite> INSERT INTO tbl ( a, b, c ) VALUES ( 10, 10, 10 );
sqlite> INSERT INTO tbl ( a, b, c ) VALUES ( 11, 15, 20 );
sqlite> INSERT INTO tbl ( a, b, c ) VALUES ( 12, 20, 30 );

We can return the whole table like this:

sqlite> SELECT * FROM tbl;
a           b           c           id         
----------  ----------  ----------  ----------
10          10          10          1         
11          15          20          2         
12          20          30          3         

We can also just return specific columns:

sqlite> SELECT a, c FROM tbl;
a           c         
----------  ----------
10          10        
11          20        
12          30        

Or specific rows:

sqlite> SELECT * FROM tbl WHERE id = 2;
a           b           c           id        
----------  ----------  ----------  ----------
11          15          20          2         

For more specifics, see Chapter 5 and SELECT in Appendix C.

Get Using SQLite now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.