Resolving High I/O for RDS for MySQL

2024-06-21 09:19:11

The I/O performance of MySQL depends on the storage media of the instance hardware layer (general-purpose IO, high IO, ultra-high IO EVS, local SSD, etc.), and the specific SQL statements (to scan or modify a specific amount of data) at the service layer

Introduction to the InnoDB I/O System

To manage the read and write operations of disk files, InnoDB designs a set of file I/O operation interfaces, which provides synchronous I/O and asynchronous I/O to read and write files. An independent I/O system is provided to read and write data pages. If a data page requested by SQL is not in the buffer pool, physical I/O is generated to read and write the data stored in the file system:

• Operations to read data pages Usually, the data block request read triggered by the user thread is synchronous reading, which is realized through synchronous I/O by calling the underlying read interface.

• Operations to write data pages The write data page is protected by a double write buffer by default. Therefore, the write to disk for double write buffer is synchronous write operation, and the write to data file is asynchronous write operation. Take the flushes of dirty pages for example. Background I/O threads are called based on asynchronous I/O to asynchronously flush dirty pages to the disk.

In addition to reading and writing I/O operations on common data files, other operations also cause a large number of I/Os, such as operations to write redo logs, undo logs, and binlogs, the operations to sort temporary tables, and the operations to rebuild tablespaces due to DDL statements.

High I/O Caused by High Throughput

• Issue

If the service concurrency is large with high TPS/QPS and if requests are frequently initiated to update, delete, and insert data on the tables containing many indexes or large fields, the I/O significantly increases when reading data and refreshing dirty pages.

You can go to the Monitoring Indicators page to view read and write loads.

• Solution

It is recommended to modify the current instance parameters to reduce the disk read/write frequency, optimize the parameters related to flushing dirty pages, or upgrade the instance specifications to resolve the high throughput issue. The MySQL dirty page flushing policy is mainly controlled by parameters such as:

innodb_max_dirty_pages_pct: The percentage of dirty pages allowed in the buffer pool. The default value is 75. When the percentage of dirty pages exceeds this value, InnoDB performs a write operation to write the updated data on the page to the disk file. Increasing innoDB_max_dirty_pages_pct reduces the write operation and saves some IO.

innodb_io_capacity: Set the maximum number of I/O operations performed by InnoDB per second for each background task. The value of this parameter affects the speed for flushing dirty pages and writing data to the buffer pool. When the disk I/O performance is insufficient, if the parameter value is too large, it causes I/O blocking and a great reduction of performance of the InnoDB engine. The default value varies depending on the disk type.

innodb_io_capacity_max: InnoDB can flush dirty pages exceeding the limit of InnoDB_io_capacity limit but cannot exceed the value of this parameter when the flushing operation falls behind and exceeds the maximum percentage of dirty pages allowed in the buffer pool.

Check the monitoring indicator of I/O usage after adjustment. If it remains high, it is recommended to scale up the instance memory or upgrade the instance to a high IOPS disk type.

High I/O Caused by Temporary Tables

• Issue

In general, the temporary files generated by MySQL are very small and released in time. However, a large number of temporary files may be generated in a short period of time in some abnormal cases, such as slow queries. If the temporary directory is large, there may be operations such as slow SQL sorting, and distinction that result in the creation of a large temporary table. Data write operations to temporary tables also increase I/O. You can go to the Monitoring Indicators page of the instance to view temporary table creation.

• Solution

On the Slow Log page of the Console, download and view the SQL statements that are executed slowly. Optimize the corresponding SQL by analyzing the execution duration and execution plan of the slow SQL.

High I/O Caused by Cold Data Reads

• Issue RDS for MySQL uses the buffer pool cache architecture. If the data rows accessed or modified by SQL are not hit in the buffer pool, RDS needs to read the physical files from the file system. At this time, a large number of read I/Os will be generated. If the buffer pool has insufficient idle pages, some dirty page data needs to be exchanged to the disk, which also results in a large number of write I/Os. You can go to the Monitoring Indicators page of the instance to view the buffer pool hit ratio.

• Solution Redesign the cache policy based on your service scenario, or upgrade the instance specifications.

High I/O Caused by Binlog Writes from Large Transactions

• Issue RDS for MySQL writes binlogs when the transaction is committed. The default binlog is in the row format. If large transactions appear, such as deleting large amounts of data in a large table, a large number of binlogs (such as a few dozen or more) will be generated. Flushing these binlog files to the disk causes a sudden high disk I/O usage.

• Solution It is recommended to split large transactions into small ones. This allows you to avoid large transactions and reduce the frequency of flushing dirty pages to the disk.