Wednesday, August 31, 2011

Alter waiting on Select statement to complete

A few days ago, we faced an interesting problem on one of our customer’s slave mysqld servers. An Alter for adding a new column was run on master server took 542 seconds where as it took few hours on the slave server to complete due to a SELECT blocking the Alter was not allowed to complete.

Here is the the select on the master server and it’s execution time from the binary logs.

# at 825737566
#110720 19:55:21 server id 31415  end_log_pos 825737730         Query   thread_id=83250629      exec_time=542   error_code=0
use collect/*!*/;
SET TIMESTAMP=1311216921/*!*/;
/*!\C latin1 *//*!*/;
SET @@session.character_set_client=8,@@session.collation_connection=8,@@session.collation_server=8/*!*/;
ALTER TABLE `track` ADD `source_track_id` int(10) UNSIGNED NULL DEFAULT NULL  AFTER `vendor_track_id`
/*!*/;

Alter statement completed well on the master and it got blocked by a SELECT on the slave where as the time frames of the threads in the processlist which were executing the Alter and the SELECT resembles the ALTER was started first and then the SELECT next. As per the binary logs on the master the Alter was a single statement by the thread_id and the slave started lagging.

mysql> show processlist ;
+---------+-------------+----------------------------+-------------+-------------+--------+----------------------------------------------------------------+------------------------------------------------------------------------------------------------------+
| Id      | User        | Host                       | db          | Command     | Time   | State                                                          | Info                                                                                                 |
+---------+-------------+----------------------------+-------------+-------------+--------+----------------------------------------------------------------+------------------------------------------------------------------------------------------------------+
|     864     | system user |                            | NULL         | Connect     | 479352 | Waiting for master to send event                               | NULL                                                                                                 |
|     865     | system user |                            | collect       | Connect     |   5383 | rename result table                                          | ALTER TABLE `track` ADD `source_track_id` int(10) UNSIGNED NULL DEFAULT NULL  AFTER `vendor_track_id |
| 3277392 | ios               | intreport:56045   | collect       | Query         |   4850 | Copying to tmp table                                       | SELECT /* Reporting Query 357 */
fdr.service_provider_code AS 'Provider',
fdr.service_provider_count |
| 3279284 | uios             | solr02:47871       | collect       | Query         |   4503 | Waiting for table                                              | select track_id,
vendor_track_id,
track_name,
if(promo = 'Y', 'true', 'false') as promo_track,
if( |
.........
.........
 
 Seconds_Behind_Master: 5131

We did the possible checks to find the status of the Alter and why is it waiting at the RENAME RESULT TABLE to complete. The file level checks shows that the creation of the temp table for Alter was completed while ago.

pythian@db04:~$ ls -ltr /ssd2/mysql/collect/track.*
-rwxr-xr-x 1 mysql mysql      19153 2011-07-15 06:11 /ssd2/mysql/collect/track.frm
-rwxr-xr-x 1 mysql mysql       8980 2011-07-15 06:11 /ssd2/mysql/collect/track.TRG
-rwxr-xr-x 1 mysql mysql 2646605824 2011-07-20 20:40 /ssd2/mysql/collect/track.ibd
 
pythian@db04:~$ ls -ltr /mysql/data/collect/track.*
lrwxrwxrwx 1 mysql mysql 26 2011-07-15 08:08 /mysql/data/collect/track.TRG -> /ssd2/mysql/collect/track.TRG
lrwxrwxrwx 1 mysql mysql 26 2011-07-15 08:08 /mysql/data/collect/track.ibd -> /ssd2/mysql/collect/track.ibd
lrwxrwxrwx 1 mysql mysql 26 2011-07-15 08:08 /mysql/data/collect/track.frm -> /ssd2/mysql/collect/track.frm
 
pythian@db04:~$ ls -ltr /mysql/data/collect/*#*
-rwxr-xr-x 1 mysql mysql      19205 2011-07-20 20:04 /mysql/data/collect/#sql-7130_361.frm
-rwxr-xr-x 1 mysql mysql 2499805184 2011-07-20 20:23 /mysql/data/collect/#sql-7130_361.ibd
 
pythian@db04:~$ date
Wed Jul 20 22:13:27 PDT 2011

The show engine innodb status shows that the Alter was not started yet though it was started, created the temp table and waiting at the the RENAME TABLE operation. The SELECT query which is ACTIVE for 9168 seconds blocked the Alter statement and not allowing it to complete.


---TRANSACTION 379ED1F1F, not started, process no 28976, OS thread id 1259583824
mysql tables in use 1, locked 2
MySQL thread id 865, query id 221087068 rename result table
ALTER TABLE `track` ADD `source_track_id` int(10) UNSIGNED NULL DEFAULT NULL  AFTER `vendor_track_id`
---TRANSACTION 379ED1883, ACTIVE 9168 sec, process no 28976, OS thread id 1279453520 fetching rows, thread declared inside InnoDB 49
mysql tables in use 8, locked 0
MySQL thread id 3277392, query id 221086895 intreport 10.1.4.151 collect Copying to tmp table
SELECT /* Reporting Query 357 */
fdr.service_provider_code AS 'Provider',

A few guesses from mates saying that

1) The select has been started first so it is blocking Alter. ( Not true in this case )
2) ALTER started first and before it completes, SELECT fires, fast index creation of secondary indexes allows concurrent SELECTS but still has to get global lock during RENAME as per the explanation in docs in this blog post. But the Alter in this case is not a secondary index rebuild.

A little more digging into documentation revealed that ALTER TABLE makes a temporary copy of the original table. MySQL waits for other operations that are modifying the table, then proceeds. It incorporates the alteration into the copy, deletes the original table, and renames the new one. While ALTER TABLE is executing, the original table is readable by other sessions, but at the end ALTER still has to acquire global lock on data dictionary to swap original and new table copies, that is why we could see other threads in “Waiting for table” state. Updates and writes to the table that begin after the ALTER TABLE operation begins are stalled until the new table is ready, then are automatically redirected to the new table without any failed updates. So what happened in this situation was: ALTER started first, then concurrent SELECT started on the same table, when ALTER finished copy to a temp table it tried to RENAME table, but failed to get global lock on data dictionary. All the threads that started after this point had to wait on ALTER to finish.

More details on the Alter operation can be found here.

Monday, June 20, 2011

Investigating Sort aborted issues

A few days ago, I and few of my fellow members had a discussion about the "Sort aborted" error in the error log should be monitored or it can be ignored because it doesn't bring any value by alerting the oncall DBA who checks the machine and mysqld server status when the error appears.

We have identified the possible or known causes of this error were

1) It could be generated when
a) users cancelling or killing queries in the middle of a filesort
b) web servers rebooting while some queries were sorting i.e when the connection to the database drops suddenly
2) Another is lack of sufficient temp disk space in the tmpdir prevented tmpfile from being created for queries to execute.

As a DBA, we can't do much with the first point apart from informing customer to check at the application side for connection drop outs.
The answer for the second point is to change tmpdir in the mysql to point to a directory (a new one, just for this purpose) in the biggest file system.

3) a transaction got rolled back or aborted due to lock wait timeout or deadlock.

In a database, a deadlock is a situation in which two or more transactions are waiting for one another to give up locks.
Even if a transaction is not involved in a deadlock, it might have to wait a considerable amount of time to obtain a lock because of a long-running transaction or transactions holding locks on the tables it needs. In such a situation, you might not want a transaction to wait indefinitely. Instead, you might want the waiting transaction to abort, or time out, after a reasonable amount of time, called a lock wait timeout.

In such cases, we have to increase the innodb_lock_wait_timeout value in the database server if we want the transaction to wait for more time and complete. We can find the deadlock details in "LATEST DETECTED DEADLOCK" section of the show engine innodb status\G output and therefore needs an investigation of the whole sql statements executed by both transactions with the application dev team to identify the root cause and it's resolution.

4) Unexpected errors, such as source table or even tmp table was corrupt processing of a subquery failed which was also sorting.

In this case, we have to repair the corrupted source table using repair table table_name or mysqlcheck.

5) insufficient memory for sort_buffer_size allocated or Out of sort memory; increase server sort buffer size.

Check the sort buffer size allocated on the server and increase a little at a time in amount of memory because it is a per connection variable and can cause out of memory on the server itself.

[perl]
mysql> show global variables like '%sort%';
+---------------------------+--------------+
| Variable_name | Value |
+---------------------------+--------------+
| max_length_for_sort_data | 1024 |
| max_sort_length | 1024 |
| myisam_max_sort_file_size | 107374182400 |
| myisam_sort_buffer_size | 36700160 |
| sort_buffer_size | 262144 |
+---------------------------+--------------+
5 rows in set (0.14 sec)
[/perl]

Apart from the above causes, Few more things needs to be checked when we see this type of messages in the error logs are :

1) How big is your biggest single table (data + index) ?
2) Do you have any JOINs that involve a huge amount of data being scanned or returned ?
3) What file-system do you have on your machine ?
4) Can you identify the queries that were running at that time ? How long does it run ?

Let's go through the above things in detail.

To find out the biggest table size in the database, execute the below sql query

[perl]
SELECT SQL_CALC_FOUND_ROWS TABLE_SCHEMA, TABLE_NAME, TABLE_TYPE, ENGINE, TABLE_ROWS,
TRUNCATE(DATA_LENGTH/pow(1024,2),2) as 'DATA (M)',
TRUNCATE(INDEX_LENGTH/pow(1024,2),2) as 'INDEX (M)',
TRUNCATE(((DATA_LENGTH+INDEX_LENGTH))/pow(1024,2),2) AS 'TOTAL(M)'
FROM information_schema.tables
WHERE TABLE_SCHEMA not in ( 'information_schema', 'mysql')
AND TABLE_TYPE = 'BASE TABLE'
ORDER BY 8 DESC
LIMIT 1;
[/perl]

The easiest way to find the query with JOINs that involve a huge amount of data being scanned or returned is to enable slow query log on the database server and run the below command on your shell prompt by replacing the actual slow query log, which gives the maximum rows scanned list in the slow query log.

[perl]
$less slowquery.log | grep -i "# Query_time:" | awk '{print $9}' | sort -rn | more
123949125
123947793
123858573
123855366
76087924
75933311
75776390
75691301
75671196
75556710
[/perl]

After finding out the top few large scan number details, then run the below by replacing the highest number in the grep condition to see the query details.

[perl]
$less slowquery.log | grep 123949125 -A 2 B4 | more
[/perl]

Once we got the list of queries which are going for large scan, we can tune it by using Explain plan, creating appropriate indexes for the columns involved in the joins and the sort order etc.

Coming to different filesystems and it's limitations, it can be find here
The filesystems and it's type on a linux machine can be found by running the below commands in the shell prompt.

[perl]
[suresh@home]$ df -T
Filesystem Type 1K-blocks Used Available Use% Mounted on
/dev/mapper/sysvg-root
ext3 470000608 24019012 421721808 6% /
/dev/mapper/mysqlvg-mysql
ext3 945112464 289180564 607148540 33% /mysql
/dev/sda1 ext3 505604 17557 461943 4% /boot
tmpfs tmpfs 24724440 0 24724440 0% /dev/shm

[suresh@home]$ mount
/dev/mapper/sysvg-root on / type ext3 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
/dev/mapper/mysqlvg-mysql on /mysql type ext3 (rw)
/dev/sda1 on /boot type ext3 (rw)
tmpfs on /dev/shm type tmpfs (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
[/perl]

If we have reached the file system size limits for a give table then we have go for better filesystems.

To find the queries that are running at that time and how long is to log the processlist into a log file periodically using the below command.

[perl]
mysqladmin --user=xxxx --password=xxxxx --sleep=300 processlist
[/perl]

The above are the possible ways of finding the possible causes and it's resolution methods for the error "Sort aborted".

Previously, earlier to the version 5.5.10, for queries that were aborted due to sort problem, the server wrote the message Sort aborted to the error log.
In the latest versions, developers realize that they need to add more details to the error log when ever the error occurs, mysqld server writes more information to provide a more specific message which is useful, such as:

[perl]
[ERROR] mysqld: Sort aborted: Out of memory (Needed 24 bytes)

[ERROR] mysqld: Sort aborted: Out of sort memory,
consider increasing server sort buffer size

[ERROR] mysqld: Sort aborted: Incorrect number of arguments for
FUNCTION test.f1; expected 0, got 1
In addition, if the server was started with --log-warnings=2,
the server write information about the host, user, and query.
[/perl]

In addition, if the server was started with --log-warnings=2, the server write information about the host, user, and query.

Wednesday, May 25, 2011

InnoDB error : Total number of locks exceeds the lock table size

Recently, one of the clients that I'm working with had a problem with one of their replication slaves where an sql statement gave an error while executing on the slave which got executed successfully on the Master server.

The slave stopped with the error as below.

mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: reporting.com
Master_User: replication
Master_Port: 3306
Connect_Retry: 10
Master_Log_File: mysql3308-bin-log.000488
Read_Master_Log_Pos: 340658086
Relay_Log_File: mysql3319-relay-log.000156
Relay_Log_Pos: 4983506
Relay_Master_Log_File: mysql3308-bin-log.000462
Slave_IO_Running: Yes
Slave_SQL_Running: No
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 1206
Last_Error: Error 'The total number of locks exceeds the lock table size' on query. Default database: 'production'. Query: 'create table Summary SELECT Dimension.calendarYearMonth, count(*) as EmailsSent, count(distinct Email_Sent.UserID) as "Unique Recipients"
FROM Dimension, Email_Sent
WHERE
Email_Sent.DateID = Dimension.id
AND Dimension.date >= '2010-11-01'
GROUP BY 1'
Skip_Counter: 0
Exec_Master_Log_Pos: 4983353
Relay_Log_Space: 14212286599
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 1206
Last_SQL_Error: Error 'The total number of locks exceeds the lock table size' on query. Default database: 'production'. Query: 'create table Summary SELECT Dimension.calendarYearMonth, count(*) as EmailsSent, count(distinct Email_Sent.UserID) as "Unique Recipients"
FROM Dimension, Email_Sent
WHERE
Email_Sent.DateID = Dimension.id
AND Dimension.date >= '2010-11-01'
GROUP BY 1'
1 row in set (0.00 sec)


I have started the slave and monitored the query running status using "show engine innodb status\G". According to the EXPLAIN plan of the query, it is estimated to scan and lock about 910 million rows. The below InnoDB status showed that it already locked 677 million rows and the slave got stopped again with the same error.

This is an error specific to InnoDB tables. The query mentioned above attempts to set locks on all records scanned, to ensure the data returned doesn’t change before creation of the new table. These locks are put into an area of memory allocated within innodb_buffer_pool_size variable.




---TRANSACTION 5722EE5D, ACTIVE 18100 sec, process no 13052, OS thread id 1222195536 fetching rows
mysql tables in use 2, locked 2
2653776 lock struct(s), heap size 327759856, 677089582 row lock(s)
MySQL thread id 2860, query id 1863366 Copying to tmp table on disk
create table Summary SELECT Dimension.calendarYearMonth, count(*) as EmailsSent, count(distinct Email_Sent.UserID) as "Unique Recipients"
FROM Dimension, Email_Sent
WHERE
Email_Sent.DateID = Dimension.id
AND Dimension.date >= '2010-11-01'
GROUP BY 1
TABLE LOCK table `production`.`Email_Sent` trx id 5722EE5D lock mode IS
RECORD LOCKS space id 5623 page no 11 n bits 440 index `ClientID` of table `production`.`Email_Sent` trx id 5722EE5D lock mode S
TABLE LOCK table `production`.`Date_Dimension` trx id 5722EE5D lock mode IS
RECORD LOCKS space id 5611 page no 34 n bits 240 index `PRIMARY` of table `production`.`Date_Dimension` trx id 5722EE5D lock mode S locks rec but not gap
RECORD LOCKS space id 5623 page no 12 n bits 440 index `ClientID` of table `production`.`Email_Sent` trx id 5722EE5D lock mode S
RECORD LOCKS space id 5623 page no 15 n bits 272 index `ClientID` of table `production`.`Email_Sent` trx id 5722EE5D lock mode S
RECORD LOCKS space id 5623 page no 18 n bits 440 index `ClientID` of table `production`.`Email_Sent` trx id 5722EE5D lock mode S
RECORD LOCKS space id 5623 page no 21 n bits 224 index `ClientID` of table `production`.`Email_Sent` trx id 5722EE5D lock mode S
RECORD LOCKS space id 5623 page no 29 n bits 440 index `ClientID` of table `production`.`Email_Sent` trx id 5722EE5D lock mode S
RECORD LOCKS space id 5623 page no 31 n bits 440 index `ClientID` of table `production`.`Email_Sent` trx id 5722EE5D lock mode S
TOO LOCKS PRINTED FOR THIS TRX: SUPPRESSING FURTHER PRINTS


We found the main difference in MySQL configuration between Master and Slave was amount of memory allocated to InnoDB_buffer_pool_size because Slave was running several MySQL instances that were used for backup purposes.

Need to increase the memory of innodb_buffer_pool_size variable where the locks table can fit in the configuration file and restart the database should fix the problem.

InnoDB uses buffer pool to cache data and indexes of its tables, but it also uses it to store system information, like locks table. The query set locks on all the records it scans. If we have a small buffer pool size, then the InnoDB lock table may indeed grow so big that it does not fit in the buffer pool. Locks table size for the 677 million rows was 327M according to the innodb status. So, we have increased the innodb_buffer_pool_size from 500MB to 1GB which was enough to hold all the locks and restarted the database server. Now the SQL query completed successfully on the slave.


More details on the Innodb row level locks and the size it takes can be found here.