MySQL and the Future (the TODO)

This appendix lists the features that we plan to implement in MySQL server.

Everything in this list is approximately in the order it will be done. If you want to affect the priority order, please register a license or support us and tell us what you want to have done more quickly. See Section 1.4.

The plan is that we in the future will support the full ANSI SQL99 standard, but with a lot of useful extensions. The challenge is to do this without sacrifying the speed or compromising the code.

Things That Should Be in 4.0

We are now in the final stages of the development of the MySQL server 4.0. server. The target is to quickly implement the rest of the following features and then shift development to MySQL server 4.1. See Section 1.5.

The news section for 4.0 includes a list of the features we have already implemented in the 4.0 tree. See http://www.mysql.com/doc/N/e/News-4.0.x.html.

This section lists features not yet implemented in the current version of MySQL server 4.0, which will, however, be implemented in later versions of MySQL 4.0. This being very volatile information, please consider this list valid only if you are reading it from the MySQL web site (http://www.mysql.com/).

Allow users to change startup options without taking down the server.
Better command-line argument handling.
New key cache, which will give better performance when using many threads.
New table definition file format (.frm files). This will enable us to not run out of bits when adding more table options. One will still be able to use the old .frm file format with 4.0. All newly created tables will, however, use the new format.
The new file format will enable us to add new column types, more options for keys, and possibly to store and retrieve FOREIGN KEY definitions.
SHOW COLUMNS FROM table_name (used by mysql client to allow expansions of column names) should not open the table, only the definition file. This will require less memory and be much faster.
SET SQL_DEFAULT_TABLE_TYPE=[MyISAM | INNODB | BDB | HEAP].

Things That Should Be in 4.1

The following features are planned for inclusion into MySQL 4.1. Note that because we have many developers that are working on different projects, there will also be many additional features. There is also a small chance that some of these features will be added to MySQL 4.0. Some of the work on MySQL 4.1 is already in progress.

Subqueries. SELECT id FROM t WHERE grp IN (SELECT grp FROM g WHERE u > 100)
Foreign keys, including cascading delete.
Fail-safe replication.
Replication should work with RAND( ) and user variables @var.
Online backup with very low performance penalty. The online backup will make it easy to add a new replication slave without taking down the master.
Derived tables:
```
SELECT a.col1, b.col2
       FROM (SELECT MAX(col1) AS col1 FROM root_table) a,
       other_table b
       WHERE a.col1=b.col1;
```
This could be done by automatically creating temporary tables for the derived tables for the duration of the query.
Allow DELETE on MyISAM tables to use the record cache. To do this, we need to update the threads record cache when we update the .MYD file.
When using SET CHARACTER SET we should translate the whole query at once and not only strings. This will enable users to use the translated characters in database, table, and column names.
Add record_in_range( ) method to MERGE tables to be able to choose the right index when there are many to choose from. We should also extend the info interface to get the key distribution for each index, if analyze is run on all subtables.
RENAME TABLE on a table used in an active MERGE table may corrupt the table.
A faster, smaller embedded MySQL library. (Compatible with the old one.)
Stable openssl support. (MySQL 4.0 supports rudimentary, not 100% tested, support for openssl).
Add support for sorting on UNICODE.
Character set casts and syntax for handling multiple character sets.
Help for all commands from the client.
New faster client/server protocol which will support prepared statements, bound parameters, and bound result columns, binary transfer of data, warnings...
Add database and real table name (in case of alias) to the MYSQL_FIELD structure.
Add options to the client/server protocol to get progress notes for long running commands.
Implement RENAME DATABASE. To make this safe for all table handlers, it should work as follows:
- Create the new database.
- For every table do a rename of the table to another database, as we do with the RENAME command.
- Drop the old database.
Add true VARCHAR support (there is already support for this in MyISAM).
Optimise BIT type to take 1 bit (now BIT takes 1 char).
New internal file interface change. This will make all file handling much more general and make it easier to add extensions like RAID. (the current implementation is a hack.)
Better in-memory (HEAP) tables:
- Support for B-tree indexes
- Dynamic size rows
- Faster row handling (less copying)

Things That Must Be Done in the Real Near Future

Atomic multi-table updates—e.g., update items,month set items.price=month.price where items.id=month.id;;
Don’t allow more than a defined number of threads to run MyISAM recover at the same time.
Change INSERT ... SELECT to optionally use concurrent inserts.
Return the original field types( ) when doing SELECT MIN(column) ... GROUP BY.
Multiple result sets.
Make it possible to specify long_query_time with a granularity in microseconds.
Add a configurable prompt to the mysql command-line client, with options like database in use, time and date...
Link the myisampack code into the server.
Port of the MySQL code to QNX.
Port of the MySQL code to BeOS.
Port of the MySQL clients to LynxOS.
Add a temporary key buffer cache during INSERT/DELETE/UPDATE so that we can gracefully recover if the index file gets full.
If you perform an ALTER TABLE on a table that is symlinked to another disk, create temporary tables on this disk.
Implement a DATE/DATETIME type that handles time zone information properly so that dealing with dates in different time zones is easier.
FreeBSD and MIT-pthreads; do sleeping threads take CPU time?
Check if locked threads take any CPU time.
Fix configure so that one can compile all libraries (like MyISAM) without threads.
Add an option to periodically flush key pages for tables with delayed keys if they haven’t been used in a while.
Allow join on key parts (optimisation issue).
INSERT SQL_CONCURRENT and mysqld --concurrent-insert to do a concurrent insert at the end of the file if the file is read-locked.
Server-side cursors.
Check if lockd works with modern Linux kernels; if not, we have to fix lockd! To test this, start mysqld with --enable-locking and run the different fork* test suits. They shouldn’t give any errors if lockd works.
Allow SQL variables in LIMIT, like in LIMIT @a,@b.
Allow update of variables in UPDATE statements. For example: UPDATE TABLE foo SET @a=a+b,a=@a, b=@a+c.
Change when user variables are updated so that one can use them with GROUP BY, as in the following example: SELECT id, @a:=COUNT(*), SUM(sum_col)/@a FROM table_name GROUP BY id.
Don’t add automatic DEFAULT values to columns. Give an error when using an INSERT that doesn’t contain a column that doesn’t have a DEFAULT.
Fix libmysql.c to allow two mysql_query( ) commands in a row without reading results or give a nice error message when one does this.
Check why MIT-pthreads ctime( ) doesn’t work on some FreeBSD systems.
Add an IMAGE option to LOAD DATA INFILE to not update TIMESTAMP and AUTO_INCREMENT fields.
Added LOAD DATE INFILE ... UPDATE syntax.
- For tables with primary keys, if the data contains the primary key, entries matching that primary key are updated from the remainder of the columns. However, columns missing from the incoming data feed are not touched.
- For tables with primary keys that are missing some part of the key in the incoming data stream, or that have no primary key, the feed is treated as a LOAD DATA INFILE ... REPLACE INTO now.

Make LOAD DATA INFILE understand syntax like:

LOAD DATA INFILE 'file_name.txt' INTO TABLE tbl_name
     TEXT_FIELDS (text_field1, text_field2, text_field3)
     SET table_field1=CONCAT(text_field1, text_field2),
         table_field3=23
     IGNORE text_field3

This can be used to skip over extra columns in the text file, or update columns based on expressions of the read data.

LOAD DATA INFILE 'file_name' INTO TABLE 'table_name' ERRORS TO err_table_name. This would cause any errors and warnings to be logged into the err_table_name table. That table would have a structure like:
```
line_number    - line number in data file
error_message  - the error/warning message
and maybe
data_line      - the line from the data file
```
Automatic output from mysql to Netscape.
LOCK DATABASES (with various options.)
DECIMAL and NUMERIC types can’t read exponential numbers; Field_decimal::store(const char *from,uint len) must be recoded to fix this.
Functions: ADD_TO_SET(value,set) and REMOVE_FROM_SET(value,set).
Add use of t1 JOIN t2 ON ... and t1 JOIN t2 USING ... Currently, you can only use this syntax with LEFT JOIN.
Many more variables for show status. Records reads and updates. Selects on 1 table and selects with joins. Mean number of tables in select. Number of ORDER BY and GROUP BY queries.
If you abort mysql in the middle of a query, you should open another connection and kill the old running query. Alternatively, an attempt should be made to detect this in the server.
Add a handler interface for table information so that you can use it as a system table. This would be a bit slow if you requested information about all tables, but very flexible. SHOW INFO FROM tbl_name for basic table information should be implemented.
NATURAL JOIN.
Allow SELECT a FROM crash_me LEFT JOIN crash_me2 USING (a); in this case a is assumed to come from the crash_me table.
Fix so that ON and USING works with the JOIN join type.
Oracle-like CONNECT BY PRIOR ... to search hierarchy structures.
mysqladmin copy database new-database; requires COPY command to be added to mysqld.
Processlist should show number of queries/threads.
SHOW HOSTS for printing information about the hostname cache.
DELETE and REPLACE options to the UPDATE statement (this will delete rows when one gets a duplicate key error while updating).
Change the format of DATETIME to store fractions of seconds.
Add all missing ANSI92 and ODBC 3.0 types.
Change table names from empty strings to NULL for calculated columns.
Don’t use Item_copy_string on numerical values to avoid number->string->number conversion in case of: SELECT COUNT(*)*(id+0) FROM table_name GROUP BY id
Make it possible to use the new GNU regexp library instead of the current one (the GNU library should be much faster than the old one).
Change so that ALTER TABLE doesn’t abort clients that execute INSERT DELAYED.
Fix so that when columns are referenced in an UPDATE clause, they contain the old values from before the update started.
Add simulation of pread( )/pwrite( ) on Windows to enable concurrent inserts.
A logfile analyser that could parse out information about which tables are hit most often, how often multi-table joins are executed, etc. It should help users identify areas or table design that could be optimised to execute much more efficient queries.
Add SUM(DISTINCT).
Add ANY( ), EVERY( ), and SOME( ) group functions. In ANSI SQL these work only on boolean columns, but we can extend these to work on any columns/expressions by applying: value == 0 -> FALSE and value <> 0 -> TRUE.

Fix that the type for MAX(column) is the same as the column type:

mysql> CREATE TABLE t1 (a DATE);
mysql> INSERT INTO t1 VALUES (NOW( ));
mysql> CREATE TABLE t2 SELECT MAX(a) FROM t1;
mysql> SHOW COLUMNS FROM t2;

Come up with a nice syntax for a statement that will UPDATE the row if it exists and INSERT a new row if the row didn’t exist (like REPLACE works with INSERT / DELETE).

Things That Have to Be Done Sometime

Implement function: get_changed_tables(timeout,table1,table2,...).
Change reading through tables to use memmap when possible. Now only compressed tables use memmap.
Add a new privilege Show_priv for SHOW commands.
Make the automatic timestamp code nicer. Add timestamps to the update log with SET TIMESTAMP=#;.
Use read/write mutex in some places to get more speed.
Full foreign key support. One probably wants to implement a procedural language first.
Simple views (first on one table, later on any expression).
Automatically close some tables if a table, temporary table, or temporary files gets error 23 (not enough open files).
When one finds a field=#, change all occurrences of field to #. Now this is only done for some simple cases.
Change all const expressions with calculated expressions if possible.
Optimise key = expression. At the moment only key = field or key = constant are optimised.
Join some of the copy functions for nicer code.
Change sql_yacc.yy to an inline parser to reduce its size and get better error messages (5 days).
Change the parser to use only one rule per different number of arguments in function.
Use of full calculation names in the order part (for ACCESS97).
MINUS, INTERSECT, and FULL OUTER JOIN. (Currently UNION [in 4.0] and LEFT OUTER JOIN are supported.)
SQL_OPTION MAX_SELECT_TIME=# to put a time limit on a query.
Make the update log to a database.
Negative LIMIT to retrieve data from the end.
Alarm around client connect/read/write functions.
Please note the changes to safe_mysqld: according to FSSTND (which Debian tries to follow) PID files should go into /var/run/<progname>.pid and log files into /var/log. It would be nice if you could put the “DATADIR” in the first declaration of “pidfile” and “log”, so the placement of these files can be changed with a single statement.
Allow a client to request logging.
Add use of zlib( ) for gzip-ed files to LOAD DATA INFILE.
Fix sorting and grouping of BLOB columns (partly solved now).
Stored procedures. Triggers are also being looked at.
A simple (atomic) update language that can be used to write loops and such in the MySQL server.
Change to use semaphores when counting threads. One should first implement a semaphore library to MIT-pthreads.
Don’t assign a new AUTO_INCREMENT value when one sets a column to 0. Use NULL instead.
Add full support for JOIN with parentheses.
As an alternative for one thread/connection manage a pool of threads to handle the queries.
Allow one to get more than one lock with GET_LOCK. When doing this, one must also handle the possible deadlocks this change will introduce.