Wednesday, August 31, 2011

Alter waiting on Select statement to complete

A few days ago, we faced an interesting problem on one of our customer’s slave mysqld servers. An Alter for adding a new column was run on master server took 542 seconds where as it took few hours on the slave server to complete due to a SELECT blocking the Alter was not allowed to complete.

Here is the the select on the master server and it’s execution time from the binary logs.

# at 825737566
#110720 19:55:21 server id 31415  end_log_pos 825737730         Query   thread_id=83250629      exec_time=542   error_code=0
use collect/*!*/;
SET TIMESTAMP=1311216921/*!*/;
/*!\C latin1 *//*!*/;
SET @@session.character_set_client=8,@@session.collation_connection=8,@@session.collation_server=8/*!*/;
ALTER TABLE `track` ADD `source_track_id` int(10) UNSIGNED NULL DEFAULT NULL  AFTER `vendor_track_id`
/*!*/;

Alter statement completed well on the master and it got blocked by a SELECT on the slave where as the time frames of the threads in the processlist which were executing the Alter and the SELECT resembles the ALTER was started first and then the SELECT next. As per the binary logs on the master the Alter was a single statement by the thread_id and the slave started lagging.

mysql> show processlist ;
+---------+-------------+----------------------------+-------------+-------------+--------+----------------------------------------------------------------+------------------------------------------------------------------------------------------------------+
| Id      | User        | Host                       | db          | Command     | Time   | State                                                          | Info                                                                                                 |
+---------+-------------+----------------------------+-------------+-------------+--------+----------------------------------------------------------------+------------------------------------------------------------------------------------------------------+
|     864     | system user |                            | NULL         | Connect     | 479352 | Waiting for master to send event                               | NULL                                                                                                 |
|     865     | system user |                            | collect       | Connect     |   5383 | rename result table                                          | ALTER TABLE `track` ADD `source_track_id` int(10) UNSIGNED NULL DEFAULT NULL  AFTER `vendor_track_id |
| 3277392 | ios               | intreport:56045   | collect       | Query         |   4850 | Copying to tmp table                                       | SELECT /* Reporting Query 357 */
fdr.service_provider_code AS 'Provider',
fdr.service_provider_count |
| 3279284 | uios             | solr02:47871       | collect       | Query         |   4503 | Waiting for table                                              | select track_id,
vendor_track_id,
track_name,
if(promo = 'Y', 'true', 'false') as promo_track,
if( |
.........
.........
 
 Seconds_Behind_Master: 5131

We did the possible checks to find the status of the Alter and why is it waiting at the RENAME RESULT TABLE to complete. The file level checks shows that the creation of the temp table for Alter was completed while ago.

pythian@db04:~$ ls -ltr /ssd2/mysql/collect/track.*
-rwxr-xr-x 1 mysql mysql      19153 2011-07-15 06:11 /ssd2/mysql/collect/track.frm
-rwxr-xr-x 1 mysql mysql       8980 2011-07-15 06:11 /ssd2/mysql/collect/track.TRG
-rwxr-xr-x 1 mysql mysql 2646605824 2011-07-20 20:40 /ssd2/mysql/collect/track.ibd
 
pythian@db04:~$ ls -ltr /mysql/data/collect/track.*
lrwxrwxrwx 1 mysql mysql 26 2011-07-15 08:08 /mysql/data/collect/track.TRG -> /ssd2/mysql/collect/track.TRG
lrwxrwxrwx 1 mysql mysql 26 2011-07-15 08:08 /mysql/data/collect/track.ibd -> /ssd2/mysql/collect/track.ibd
lrwxrwxrwx 1 mysql mysql 26 2011-07-15 08:08 /mysql/data/collect/track.frm -> /ssd2/mysql/collect/track.frm
 
pythian@db04:~$ ls -ltr /mysql/data/collect/*#*
-rwxr-xr-x 1 mysql mysql      19205 2011-07-20 20:04 /mysql/data/collect/#sql-7130_361.frm
-rwxr-xr-x 1 mysql mysql 2499805184 2011-07-20 20:23 /mysql/data/collect/#sql-7130_361.ibd
 
pythian@db04:~$ date
Wed Jul 20 22:13:27 PDT 2011

The show engine innodb status shows that the Alter was not started yet though it was started, created the temp table and waiting at the the RENAME TABLE operation. The SELECT query which is ACTIVE for 9168 seconds blocked the Alter statement and not allowing it to complete.


---TRANSACTION 379ED1F1F, not started, process no 28976, OS thread id 1259583824
mysql tables in use 1, locked 2
MySQL thread id 865, query id 221087068 rename result table
ALTER TABLE `track` ADD `source_track_id` int(10) UNSIGNED NULL DEFAULT NULL  AFTER `vendor_track_id`
---TRANSACTION 379ED1883, ACTIVE 9168 sec, process no 28976, OS thread id 1279453520 fetching rows, thread declared inside InnoDB 49
mysql tables in use 8, locked 0
MySQL thread id 3277392, query id 221086895 intreport 10.1.4.151 collect Copying to tmp table
SELECT /* Reporting Query 357 */
fdr.service_provider_code AS 'Provider',

A few guesses from mates saying that

1) The select has been started first so it is blocking Alter. ( Not true in this case )
2) ALTER started first and before it completes, SELECT fires, fast index creation of secondary indexes allows concurrent SELECTS but still has to get global lock during RENAME as per the explanation in docs in this blog post. But the Alter in this case is not a secondary index rebuild.

A little more digging into documentation revealed that ALTER TABLE makes a temporary copy of the original table. MySQL waits for other operations that are modifying the table, then proceeds. It incorporates the alteration into the copy, deletes the original table, and renames the new one. While ALTER TABLE is executing, the original table is readable by other sessions, but at the end ALTER still has to acquire global lock on data dictionary to swap original and new table copies, that is why we could see other threads in “Waiting for table” state. Updates and writes to the table that begin after the ALTER TABLE operation begins are stalled until the new table is ready, then are automatically redirected to the new table without any failed updates. So what happened in this situation was: ALTER started first, then concurrent SELECT started on the same table, when ALTER finished copy to a temp table it tried to RENAME table, but failed to get global lock on data dictionary. All the threads that started after this point had to wait on ALTER to finish.

More details on the Alter operation can be found here.

No comments:

Post a Comment