Query-Related Issues

Case Sensitivity in Searches

By default, MySQL searches are case-insensitive (although there are some character sets that are never case-insensitive, such as czech). That means that if you search with col_name LIKE 'a%', you will get all column values that start with A or a. If you want to make this search case-sensitive, use something like INSTR(col_name, "A")=1 to check a prefix. Or use STRCMP(col_name, "A") = 0 if the column value must be exactly "A".

Simple comparison operations (>=, >, = , < , <=, sorting, and grouping) are based on each character’s “sort value”. Characters with the same sort value (like E, e, and é) are treated as the same character!

In older MySQL versions LIKE comparisons where done on the uppercase value of each character (E == e but E <> é). In newer MySQL versions LIKE works just like the other comparison operators.

If you want a column always to be treated in case-sensitive fashion, declare it as BINARY. See Section 6.5.3.

If you are using Chinese data in the so-called big5 encoding, you want to make all character columns BINARY. This works because the sorting order of big5 encoding characters is based on the order of ASCII codes.

Problems Using DATE Columns

The format of a DATE value is 'YYYY-MM-DD'. According to ANSI SQL, no other format is allowed. You should use this format in UPDATE expressions and in the WHERE clause of SELECT statements. For example:

mysql> SELECT * FROM tbl_name WHERE date >= '1997-05-05';

As a convenience, MySQL automatically converts a date to a number if the date is used in a numeric context (and vice versa). It is also smart enough to allow a “relaxed” string form when updating and in a WHERE clause that compares a date to a TIMESTAMP, DATE, or DATETIME column. (Relaxed form means that any punctuation character may be used as the separator between parts. For example, '1998-08-15' and '1998#08#15' are equivalent.) MySQL can also convert a string containing no separators (such as '19980815'), provided it makes sense as a date.

The special date '0000-00-00' can be stored and retrieved as '0000-00-00'. When using a '0000-00-00' date through MyODBC, it will automatically be converted to NULL in MyODBC Versions 2.50.12 and above because ODBC can’t handle this kind of date.

Because MySQL performs these conversions, the following statements work:

mysql> INSERT INTO tbl_name (idate) VALUES (19970505);
mysql> INSERT INTO tbl_name (idate) VALUES ('19970505');
mysql> INSERT INTO tbl_name (idate) VALUES ('97-05-05');
mysql> INSERT INTO tbl_name (idate) VALUES ('1997.05.05');
mysql> INSERT INTO tbl_name (idate) VALUES ('1997 05 05');
mysql> INSERT INTO tbl_name (idate) VALUES ('0000-00-00');

mysql> SELECT idate FROM tbl_name WHERE idate >= '1997-05-05';
mysql> SELECT idate FROM tbl_name WHERE idate >= 19970505;
mysql> SELECT MOD(idate,100) FROM tbl_name WHERE idate >= 19970505;
mysql> SELECT idate FROM tbl_name WHERE idate >= '19970505';

However, the following will not work:

mysql> SELECT idate FROM tbl_name WHERE STRCMP(idate,'19970505')=0;

STRCMP( ) is a string function, so it converts idate to a string and performs a string comparison. It does not convert '19970505' to a date and perform a date comparison.

Note that MySQL does not check whether the date is correct. If you store an incorrect date, such as '1998-2-31', the wrong date will be stored. If the date cannot be converted to any reasonable value, a 0 is stored in the DATE field. This is mainly a speed issue and we think it is up to the application to check the dates, not the server.

Problems with NULL Values

The concept of the NULL value is a common source of confusion for newcomers to SQL, who often think that NULL is the same thing as an empty string "". This is not the case! For example, the following statements are completely different:

mysql> INSERT INTO my_table (phone) VALUES (NULL);
mysql> INSERT INTO my_table (phone) VALUES ("");

Both statements insert a value into the phone column, but the first inserts a NULL value and the second inserts an empty string. The meaning of the first can be regarded as “phone number is not known” and the meaning of the second can be regarded as “she has no phone”.

In SQL, the NULL value is always false in comparison to any other value, even NULL. An expression that contains NULL always produces a NULL value unless otherwise indicated in the documentation for the operators and functions involved in the expression. All columns in the following example return NULL:

mysql> SELECT NULL,1+NULL,CONCAT('Invisible',NULL);

If you want to search for column values that are NULL, you cannot use the =NULL test. The following statement returns no rows because expr = NULL is FALSE, for any expression:

mysql> SELECT * FROM my_table WHERE phone = NULL;

To look for NULL values, you must use the IS NULL test. The following shows how to find the NULL phone number and the empty phone number:

mysql> SELECT * FROM my_table WHERE phone IS NULL;
mysql> SELECT * FROM my_table WHERE phone = "";

In MySQL, as in many other SQL servers, you can’t index columns that can have NULL values. You must declare such columns NOT NULL. Conversely, you cannot insert NULL into an indexed column.

When reading data with LOAD DATA INFILE, empty columns are updated with "". If you want a NULL value in a column, you should use \N in the text file. The literal word 'NULL' may also be used under some circumstances. See Section 6.4.9.

When using ORDER BY, NULL values are presented first. If you sort in descending order using DESC, NULL values are presented last. When using GROUP BY, all NULL values are regarded as equal.

To help with NULL handling, you can use the IS NULL and IS NOT NULL operators and the IFNULL( ) function.

For some column types, NULL values are handled specially. If you insert NULL into the first TIMESTAMP column of a table, the current date and time are inserted. If you insert NULL into an AUTO_INCREMENT column, the next number in the sequence is inserted.

Problems with alias

You can use an alias to refer to a column in the GROUP BY, ORDER BY, or HAVING part. Aliases can also be used to give columns better names:

SELECT SQRT(a*b) as rt FROM table_name GROUP BY rt HAVING rt > 0;
SELECT id,COUNT(*) AS cnt FROM table_name GROUP BY id HAVING cnt > 0;
SELECT id AS "Customer identity" FROM table_name;

Note that ANSI SQL doesn’t allow you to refer to an alias in a WHERE clause. This is because when the WHERE code is executed the column value may not yet be determined. For example, the following query is illegal:

SELECT id,COUNT(*) AS cnt FROM table_name WHERE cnt > 0 GROUP BY id;

The WHERE statement is executed to determine which rows should be included in the GROUP BY part while HAVING is used to decide which rows from the result set should be used.

Deleting Rows from Related Tables

As MySQL doesn’t support sub-selects or use of more than one table in the DELETE statement, you should use the following approach to delete rows from 2 related tables:

SELECT the rows based on some WHERE condition in the main table.
DELETE the rows in the main table based on the same condition.
DELETE FROM related_table WHERE related_column IN (selected_rows).

If the total number of characters in the query with related_column is more than 1,048,576 (the default value of max_allowed_packet), you should split it into smaller parts and execute multiple DELETE statements. You will probably get the fastest DELETE by only deleting 100-1000 related_column id’s per query if the related_column is an index. If the related_column isn’t an index, the speed is independent of the number of arguments in the IN clause.

Solving Problems with No Matching Rows

If you have a complicated query that has many tables and that doesn’t return any rows, you should use the following procedure to find out what is wrong with your query:

Test the query with EXPLAIN and check if you can find something that is obviously wrong. See Section 5.2.1.
Select only those fields that are used in the WHERE clause.
Remove one table at a time from the query until it returns some rows. If the tables are big, it’s a good idea to use LIMIT 10 with the query.
Do a SELECT for the column that should have matched a row against the table that was last removed from the query.
If you are comparing FLOAT or DOUBLE columns with numbers that have decimals, you can’t use '='. This problem is common in most computer languages because floating-point values are not exact values. In most cases, changing the FLOAT to a DOUBLE will fix this. See Section A.5.7.
If you still can’t figure out what’s wrong, create a minimal test that can be run with mysql test < query.sql that shows your problems. You can create a test file with mysqldump --quick database tables > query.sql. Open the file in an editor, remove some insert lines (if there are too many of these), and add your select statement at the end of the file.
Test that you still have your problem by doing:
```
shell> mysqladmin create test2
shell> mysql test2 < query.sql
```
Post the test file using mysqlbug to mysql@lists.mysql.com.

Problems with Floating-Point Comparison

Floating-point numbers cause confusion sometimes because these numbers are not stored as exact values inside computer architecture. What one can see on the screen usually is not the exact value of the number.

Field types FLOAT, DOUBLE, and DECIMAL are such:

CREATE TABLE t1 (i INT, d1 DECIMAL(9,2), d2 DECIMAL(9,2));
INSERT INTO t1 VALUES (1, 101.40, 21.40), (1, -80.00, 0.00),
(2, 0.00, 0.00), (2, -13.20, 0.00), (2, 59.60, 46.40),
(2, 30.40, 30.40), (3, 37.00, 7.40), (3, -29.60, 0.00),
(4, 60.00, 15.40), (4, -10.60, 0.00), (4, -34.00, 0.00),
(5, 33.00, 0.00), (5, -25.80, 0.00), (5, 0.00, 7.20),
(6, 0.00, 0.00), (6, -51.40, 0.00);

mysql> SELECT i, SUM(d1) AS a, SUM(d2) AS b
    -> FROM t1 GROUP BY i HAVING a <> b;

+------+--------+-------+
| i    | a      | b     |
+------+--------+-------+
|    1 |  21.40 | 21.40 |
|    2 |  76.80 | 76.80 |
|    3 |   7.40 |  7.40 |
|    4 |  15.40 | 15.40 |
|    5 |   7.20 |  7.20 |
|    6 | -51.40 |  0.00 |
+------+--------+-------+

The result is correct. Although the first five records look like they shouldn’t pass the comparison test, they may do so because the difference between the numbers shows up around the tenth decimal or so, depending on the computer’s architecture.

The problem cannot be solved by using ROUND( ) (or a similar function) because the result is still a floating-point number. Example:

mysql> SELECT i, ROUND(SUM(d1), 2) AS a, ROUND(SUM(d2), 2) AS b
    -> FROM t1 GROUP BY i HAVING a <> b;
+------+--------+-------+
| i    | a      | b     |
+------+--------+-------+
|    1 |  21.40 | 21.40 |
|    2 |  76.80 | 76.80 |
|    3 |   7.40 |  7.40 |
|    4 |  15.40 | 15.40 |
|    5 |   7.20 |  7.20 |
|    6 | -51.40 |  0.00 |
+------+--------+-------+

This is what the numbers in row ‘a’ look like:

mysql> SELECT i, ROUND(SUM(d1), 2)*1.0000000000000000 AS a,
    -> ROUND(SUM(d2), 2) AS b FROM t1 GROUP BY i HAVING a <> b;
+------+----------------------+-------+
| i    | a                    | b     |
+------+----------------------+-------+
|    1 |  21.3999999999999986 | 21.40 |
|    2 |  76.7999999999999972 | 76.80 |
|    3 |   7.4000000000000004 |  7.40 |
|    4 |  15.4000000000000004 | 15.40 |
|    5 |   7.2000000000000002 |  7.20 |
|    6 | -51.3999999999999986 |  0.00 |
+------+----------------------+-------+

Depending on the computer architecture you may or may not see similar results. Each CPU may evaluate floating-point numbers differently. For example, in some machines you may get ‘right’ results by multiplying both arguments by 1, an example follows.

WARNING: NEVER TRUST THIS METHOD IN YOUR APPLICATION. THIS IS AN EXAMPLE OF A WRONG METHOD!!!

mysql> SELECT i, ROUND(SUM(d1), 2)*1 AS a, ROUND(SUM(d2), 2)*1 AS b
    -> FROM t1 GROUP BY i HAVING a <> b;
+------+--------+------+
| i    | a      | b    |
+------+--------+------+
|    6 | -51.40 | 0.00 |
+------+--------+------+

The reason this example seems to be working is that on the particular machine where the test was done, the CPU floating-point arithmetics happens to round the numbers to be the same, but there is no rule that any CPU should do so, so it cannot be trusted.

The correct way to do floating-point number comparison is to first decide on what is the wanted tolerance between the numbers and then do the comparison against the tolerance number. For example, if we agree that floating-point numbers should be regarded as equal, if they are same in the precision of one in ten thousand (0.0001), the comparsion should be done like this:

mysql> SELECT i, SUM(d1) AS a, SUM(d2) AS b FROM t1
    -> GROUP BY i HAVING ABS(a - b) > 0.0001;
+------+--------+------+
| i    | a      | b    |
+------+--------+------+
|    6 | -51.40 | 0.00 |
+------+--------+------+
1 row in set (0.00 sec)

And vice versa—if we wanted to get rows where the numbers are the same, the test would be:

mysql> SELECT i, SUM(d1) AS a, SUM(d2) AS b FROM t1
    -> GROUP BY i HAVING ABS(a - b) < 0.0001;
+------+-------+-------+
| i    | a     | b     |
+------+-------+-------+
|    1 | 21.40 | 21.40 |
|    2 | 76.80 | 76.80 |
|    3 |  7.40 |  7.40 |
|    4 | 15.40 | 15.40 |
|    5 |  7.20 |  7.20 |
+------+-------+-------+

Get MySQL Reference Manual now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

MySQL Reference Manual by Michael Widenius, David Axmark, Kaj Arno