Occasionally a MySQL client gets the catastrophic message "Lost connection to server during query"
or
"Server has gone away."
Although I hope
you will never face this problem, it’s good to be prepared in case it
happens. There are two main reasons for this problem caused by the MySQL
installation itself: server issues (most likely a crash) or the misuse of
connection options (usually timeout options or max_allowed_packet
).
We will discuss the configuration of connections in Chapter 3. Problems caused by hardware and third-party software will be touched on in Chapter 4. Here I want to briefly describe what to do if the server crashes.
First, determine whether you really had a server crash. You can do this with the help of process status monitors. If you run mysqld_safe or another daemon that restarts the server after a failure, the error log will contain a message indicating the server has been restarted. When mysqld starts, it always prints something like this to the error logfile:
110716 14:01:44 [Note] /apps/mysql-5.1/libexec/mysqld: ready for connections. Version: '5.1.59-debug' socket: '/tmp/mysql51.sock' port: 3351 Source distribution
So if you find such a message, the server has been restarted. If there is no message and the server is up and running, a lost connection is most likely caused by the misuse of connection options, which will be discussed in Chapter 3.
Note
If you remember when your MySQL server was originally
started, you can use the status variable uptime
,
which shows the number of seconds since the server started:
mysql> SHOW GLOBAL STATUS LIKE 'uptime';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| Uptime | 10447 |
+---------------+-------+
1 row in set (0.00 sec)
This information will also help when you want to check whether mysqld failure was not caused by an operating system restart. Just compare the value of this variable with the operating system uptime.
The reason why I rely on error logfiles comes from my job experience with cases when customers notice a problem hours after a server crash and even some time after a planned mysqld restart.
If you confirm that the server has restarted, you should examine the error log again and look for information about the crash itself. Usually you can derive enough information about the crash from the error log to avoid the same situation in the future. We’ll discuss how to investigate the small number of difficult cases you may encounter in Chapter 6. Now let’s go back to the error logfile and see examples of its typical contents in case of a server crash. I’ll list a large extract here:
Version: '5.1.39' socket: '/tmp/mysql_sandbox5139.sock' port: 5139 MySQL Community Server (GPL) 091002 14:56:54 - mysqld got signal 11 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. key_buffer_size=8384512 read_buffer_size=131072 max_used_connections=1 max_threads=151 threads_connected=1 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 338301 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. thd: 0x69e1b00 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 0x450890f0 thread_stack 0x40000 /users/ssmirnova/blade12/5.1.39/bin/mysqld(my_print_stacktrace+0x2e)[0x8ac81e] /users/ssmirnova/blade12/5.1.39/bin/mysqld(handle_segfault+0x322)[0x5df502] /lib64/libpthread.so.0[0x3429e0dd40] /users/ssmirnova/blade12/5.1.39/bin/mysqld(_ZN6String4copyERKS_+0x16)[0x5d9876] /users/ssmirnova/blade12/5.1.39/bin/mysqld(_ZN14Item_cache_str5storeEP4Item+0xc9) [0x52ddd9] /users/ssmirnova/blade12/5.1.39/bin/mysqld(_ZN26select_singlerow_subselect9send_ dataER4ListI4ItemE+0x45) [0x5ca145] /users/ssmirnova/blade12/5.1.39/bin/mysqld[0x6386d1] /users/ssmirnova/blade12/5.1.39/bin/mysqld[0x64236a] /users/ssmirnova/blade12/5.1.39/bin/mysqld(_ZN4JOIN4execEv+0x949)[0x658869] /users/ssmirnova/blade12/5.1.39/bin/mysqld(_ZN30subselect_single_select_ engine4execEv+0x36c)[0x596f3c] /users/ssmirnova/blade12/5.1.39/bin/mysqld(_ZN14Item_subselect4execEv+0x26)[0x595d96] /users/ssmirnova/blade12/5.1.39/bin/mysqld(_ZN24Item_singlerow_subselect8val_ realEv+0xd)[0x595fbd] /users/ssmirnova/blade12/5.1.39/bin/mysqld(_ZN14Arg_comparator18compare_real_ fixedEv+0x39)[0x561b89] /users/ssmirnova/blade12/5.1.39/bin/mysqld(_ZN12Item_func_ne7val_intEv+0x23)[0x568fb3] /users/ssmirnova/blade12/5.1.39/bin/mysqld(_ZN4JOIN8optimizeEv+0x12ef)[0x65208f] /users/ssmirnova/blade12/5.1.39/bin/mysqld(_Z12mysql_selectP3THDPPP4ItemP10TABLE_ LISTjR4ListIS1_ES2_jP8 st_orderSB_S2_SB_yP13select_resultP18st_select_lex_unitP13st_select_lex+0xa0) [0x654850] /users/ssmirnova/blade12/5.1.39/bin/mysqld(_Z13handle_selectP3THDP6st_lexP13select_ resultm+0x16c) [0x65a1cc] /users/ssmirnova/blade12/5.1.39/bin/mysqld[0x5ecbda] /users/ssmirnova/blade12/5.1.39/bin/mysqld(_Z21mysql_execute_commandP3THD+0x602) [0x5efdd2] /users/ssmirnova/blade12/5.1.39/bin/mysqld(_Z11mysql_parseP3THDPKcjPS2_+0x357) [0x5f52f7] /users/ssmirnova/blade12/5.1.39/bin/mysqld(_Z16dispatch_command19enum_server_ commandP3THDPcj+0xe93) [0x5f6193] /users/ssmirnova/blade12/5.1.39/bin/mysqld(_Z10do_commandP3THD+0xe6)[0x5f6a56] /users/ssmirnova/blade12/5.1.39/bin/mysqld(handle_one_connection+0x246)[0x5e93f6] /lib64/libpthread.so.0[0x3429e061b5] /lib64/libc.so.6(clone+0x6d)[0x34292cd39d]`) Trying to get some variables. Some pointers may be invalid and cause the dump to abort... thd->query at 0x6a39e60 = select 1 from `t1` where `c0` <> (select geometrycollectionfromwkb(`c3`) from `t1`) thd->thread_id=2 thd->killed=NOT_KILLED The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains information that should help you find out what
The key line indicating the reason for the crash is:
091002 14:56:54 - mysqld got signal 11 ;
This means the MySQL server was killed after it asked the operating
system for a resource (e.g., access
to a file or RAM), getting an error with code 11. In most operating
systems, this signal refers to a segmentation fault. You can find more
detailed information in the user manual for your operating system. Run
man signal
for Unix and Linux. In Windows, similar
cases will usually generate a log message like “mysqld got exception
0xc0000005.” Search the Windows user manual for the meanings of these
exception codes.
The following is the excerpt from the log about a query that was running in the thread that crashed the server:
Trying to get some variables. Some pointers may be invalid and cause the dump to abort... thd->query at 0x6a39e60 = SELECT 1 FROM `t1` WHERE `c0` <> (SELECT geometrycollectionfromwkb(`c3`) FROM `t1`) thd->thread_id=2 thd->killed=NOT_KILLED
To diagnose, rerun the query to see whether it was the cause of the crash:
mysql> SELECT 1 FROM `t1` WHERE `c0` <> (SELECT
geometrycollectionfromwkb(`c3`) FROM `t1`);
ERROR 2013 (HY000): Lost connection to MySQL server during query
Note
When I recommend repeating problems, I assume you will use the development server and not your production server. We will discuss how to safely troubleshoot in an environment dedicated to this purpose in Sandboxes. Please don’t try to repeat this example; it is based on a known bug #47780 that’s fixed in current versions. The fix exists since versions 5.0.88, 5.1.41, 5.5.0, and 6.0.14.
At this point, you have identified the source of the crash and confirmed this was really the source, but you have to rewrite the query so it does not cause a crash next time. Now we can get help from the backtrace that was printed in the log:
Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 0x450890f0 thread_stack 0x40000 /users/ssmirnova/blade12/5.1.39/bin/mysqld(my_print_stacktrace+0x2e)[0x8ac81e] /users/ssmirnova/blade12/5.1.39/bin/mysqld(handle_segfault+0x322)[0x5df502] /lib64/libpthread.so.0[0x3429e0dd40] /users/ssmirnova/blade12/5.1.39/bin/mysqld(_ZN6String4copyERKS_+0x16)[0x5d9876] /users/ssmirnova/blade12/5.1.39/bin/mysqld(_ZN14Item_cache_str5storeEP4Item+0xc9) [0x52ddd9] /users/ssmirnova/blade12/5.1.39/bin/mysqld(_ZN26select_singlerow_subselect9 send_dataER4ListI4ItemE+0x45) [0x5ca145] /users/ssmirnova/blade12/5.1.39/bin/mysqld[0x6386d1] /users/ssmirnova/blade12/5.1.39/bin/mysqld[0x64236a] /users/ssmirnova/blade12/5.1.39/bin/mysqld(_ZN4JOIN4execEv+0x949)[0x658869] /users/ssmirnova/blade12/5.1.39/bin/mysqld(_ZN30subselect_single_select_engine4execEv +0x36c)[0x596f3c] /users/ssmirnova/blade12/5.1.39/bin/mysqld(_ZN14Item_subselect4execEv+0x26)[0x595d96] /users/ssmirnova/blade12/5.1.39/bin/mysqld(_ZN24Item_singlerow_subselect8val_realEv +0xd)[0x595fbd] /users/ssmirnova/blade12/5.1.39/bin/mysqld(_ZN14Arg_comparator18compare_real_fixedEv +0x39)[0x561b89] /users/ssmirnova/blade12/5.1.39/bin/mysqld(_ZN12Item_func_ne7val_intEv+0x23)[0x568fb3] /users/ssmirnova/blade12/5.1.39/bin/mysqld(_ZN4JOIN8optimizeEv+0x12ef)[0x65208f] /users/ssmirnova/blade12/5.1.39/bin/mysqld(_Z12mysql_selectP3THDPPP4ItemP10TABLE_ LISTjR4ListIS1_ES2_jP8 st_orderSB_S2_SB_yP13select_resultP18st_select_lex_unitP13st_select_lex+0xa0) [0x654850] /users/ssmirnova/blade12/5.1.39/bin/mysqld(_Z13handle_selectP3THDP6st_lexP13select_ resultm+0x16c) [0x65a1cc] /users/ssmirnova/blade12/5.1.39/bin/mysqld[0x5ecbda] /users/ssmirnova/blade12/5.1.39/bin/mysqld(_Z21mysql_execute_commandP3THD+0x602) [0x5efdd2] /users/ssmirnova/blade12/5.1.39/bin/mysqld(_Z11mysql_parseP3THDPKcjPS2_+0x357) [0x5f52f7] /users/ssmirnova/blade12/5.1.39/bin/mysqld(_Z16dispatch_command19enum_server_ commandP3THDPcj+0xe93) [0x5f6193] /users/ssmirnova/blade12/5.1.39/bin/mysqld(_Z10do_commandP3THD+0xe6)[0x5f6a56] /users/ssmirnova/blade12/5.1.39/bin/mysqld(handle_one_connection+0x246)[0x5e93f6] /lib64/libpthread.so.0[0x3429e061b5] /lib64/libc.so.6(clone+0x6d)[0x34292cd39d]`)
The relevant lines are the calls to Item_subselect
and Item_singlerow_subselect
:
/users/ssmirnova/blade12/5.1.39/bin/mysqld(_ZN14Item_subselect4execEv+0x26)[0x595d96] /users/ssmirnova/blade12/5.1.39/bin/mysqld(_ZN24Item_singlerow_subselect8val_realEv +0xd)[0x595fbd]
How did I decide these were the culprits? In this case, I recognized
the calls from my previous troubleshooting. But a good rule of thumb is to
start from the top. These first functions are usually operating system
calls, which can be relevant, but are of no help in these circumstances,
because you cannot do anything with them, and then follow calls to the
MySQL library. Examine these from top to bottom to find which ones you can
affect. You can’t do anything with String4copy
or Item_cache_str5store
, for instance, but you can rewrite a subselect,
so we’ll start from there.
Here, even without looking into the source code for mysqld, we can play around to find the cause of
the crash. It’s a good guess that the use of a subquery is the problem
because subqueries can be converted easily to JOIN
. Let’s try rewriting the query and testing
it:
mysql> SELECT 1 FROM `t1` WHERE `c0` <> geometrycollectionfromwkb(`c3`);
Empty set (0.00 sec)
The new query does not crash, so all you need to do is change the query in the application to match.
Here I want to add a note about bugs. When you are faced with a crash and identify the reason, check the MySQL bug database for similar problems. When you find a bug that looks like the one you hit, check whether it has been fixed, and if so, upgrade your server to the version where the bug was fixed (or a newer one). This can save you time because you won’t have to fix the problematic query.
If you can’t find a bug similar to one you hit, try downloading the latest version of the MySQL server and running your query. If the bug is reproducible there, report it. It’s important to use the latest versions of stable general availability (GA) releases because they contain all current bug fixes and many old issues won’t reappear. Chapter 6 discusses how to safely test crashes in sandbox environments.
Crashes can be caused not only by particular queries, but also by the environment in which the server runs. The most common reason is the lack of free RAM. This happens particularly when the user allocates huge buffers. As I noted before, mysqld always needs slightly more RAM than the sum of all buffers. Usually the error logfile contains a rough estimation of the RAM that can be used. It looks like this:
key_buffer_size=235929600 read_buffer_size=4190208 max_used_connections=17 max_connections=2048 threads_connected=13 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_connections = 21193712 K ----- 21193712K ~= 20G
Such estimations are not precise, but still worth checking. The one just shown claims that mysqld can use up to 20G RAM! You can get powerful boxes nowadays, but it is worth checking whether you really have 20G RAM.
Another issue in the environment is other applications that run along with the MySQL server. It is always a good idea to dedicate a server for MySQL in production because other applications can use resources that you expect MySQL to use. We will describe how to debug the effects of other applications on mysqld in Chapter 4.
Get MySQL Troubleshooting now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.