We have come across several incidents where update stats have influenced a different choices of indexes causing the change in the execution plan which indeed degrades the query performance. Some of the most frequently asked questions about stats during those incident times are

when does Innodb do auto stats update ?

InnoDB updates statistics automatically when certain table state conditions are met. Specifically, InnoDB updates statistics when one of the following occurs:

  1. During Analyse table and Optimize table operations
  2. When a table has undergone significant modifications, such as adding or deleting a large number of rows, and the statistics have become stale. As per documentation it says 10% of data change would trigger mysql to update stats anytime through out the day.

Persistent optimiser feature allows mysql to store stats persistent on disk which is controlled by a global setting innodb_stats_persistent which is enabled by default from 5.7 however it can be overridden by creating table with STATS_PERSISTENT clause. The benefits of having these statistics stored on disk is upon mysql crashes or restarts it doesn’t have to recalculate the stats again for all the tables so the recovery of the instance would be faster otherwise you would see queries waiting on statistics or see significant query degradation without valid stats.

Is there a way you can control the auto stats update ?

Auto stats update can be controlled at table level and globally at instance level. Let’s see this with an example. We created a table for testing this.

CREATE TABLE `employee2` (
  `id` int NOT NULL AUTO_INCREMENT,
  `first_name` varchar(50) NOT NULL,
  `last_name` varchar(50) NOT NULL,
  `email` varchar(100) NOT NULL,
  `birthdate` date NOT NULL,
  `added` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`),
  UNIQUE KEY `email` (`email`)
) ENGINE=InnoDB;

Start of life – Stats of an empty table after creation

+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
| database_name | table_name | index_name | last_update         | stat_name    | stat_value | sample_size | stat_description                  |
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
| sbtest        | employee2  | PRIMARY    | 2023-04-29 06:56:21 | n_diff_pfx01 |          0 |           1 | id                                |
| sbtest        | employee2  | PRIMARY    | 2023-04-29 06:56:21 | n_leaf_pages |          1 |        NULL | Number of leaf pages in the index |
| sbtest        | employee2  | PRIMARY    | 2023-04-29 06:56:21 | size         |          1 |        NULL | Number of pages in the index      |
| sbtest        | employee2  | email      | 2023-04-29 06:56:21 | n_diff_pfx01 |          0 |           1 | email                             |
| sbtest        | employee2  | email      | 2023-04-29 06:56:21 | n_leaf_pages |          1 |        NULL | Number of leaf pages in the index |
| sbtest        | employee2  | email      | 2023-04-29 06:56:21 | size         |          1 |        NULL | Number of pages in the index      |
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
6 rows in set (0.00 sec)

Auto stats happens after first activity of loading data

Generally it takes upto 10 seconds for mysql to update the stats from the last time it updated the stats so after performing any activity give it around 10 seconds to observe the stats change.

mysql> insert into employee2 select * from employee limit 10000;
Query OK, 10000 rows affected (0.30 sec)
Records: 10000  Duplicates: 0  Warnings: 0

**** Auto stats update triggers with in 10 seconds of the desired percentage of row change ****

+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
| database_name | table_name | index_name | last_update         | stat_name    | stat_value | sample_size | stat_description                  |
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
| sbtest        | employee2  | PRIMARY    | 2023-04-29 06:57:17 | n_diff_pfx01 |      10297 |          20 | id                                |
| sbtest        | employee2  | PRIMARY    | 2023-04-29 06:57:17 | n_leaf_pages |         46 |        NULL | Number of leaf pages in the index |
| sbtest        | employee2  | PRIMARY    | 2023-04-29 06:57:17 | size         |         97 |        NULL | Number of pages in the index      |
| sbtest        | employee2  | email      | 2023-04-29 06:57:17 | n_diff_pfx01 |      10038 |          20 | email                             |
| sbtest        | employee2  | email      | 2023-04-29 06:57:17 | n_leaf_pages |         32 |        NULL | Number of leaf pages in the index |
| sbtest        | employee2  | email      | 2023-04-29 06:57:17 | size         |         33 |        NULL | Number of pages in the index      |
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
6 rows in set (0.00 sec)

**** However the cardinality details didn't get updated so looks like mysql is taking care of it internally at some interval ****
+-----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| Table     | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression |
+-----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| employee2 |          0 | PRIMARY  |            1 | id          | A         |           0 |     NULL |   NULL |      | BTREE      |         |               | YES     | NULL       |
| employee2 |          0 | email    |            1 | email       | A         |           0 |     NULL |   NULL |      | BTREE      |         |               | YES     | NULL       |
+-----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
2 rows in set (0.00 sec)

Test Case 1 – with less than 10% data change

In this case the stats didn’t get change during this test case so it matches the statement in the documentation

mysql> insert into employee2 select * from employee limit 10000,899;
Query OK, 899 rows affected (0.06 sec)
Records: 899  Duplicates: 0  Warnings: 0

+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
| database_name | table_name | index_name | last_update         | stat_name    | stat_value | sample_size | stat_description                  |
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
| sbtest        | employee2  | PRIMARY    | 2023-04-29 06:57:17 | n_diff_pfx01 |      10297 |          20 | id                                |
| sbtest        | employee2  | PRIMARY    | 2023-04-29 06:57:17 | n_leaf_pages |         46 |        NULL | Number of leaf pages in the index |
| sbtest        | employee2  | PRIMARY    | 2023-04-29 06:57:17 | size         |         97 |        NULL | Number of pages in the index      |
| sbtest        | employee2  | email      | 2023-04-29 06:57:17 | n_diff_pfx01 |      10038 |          20 | email                             |
| sbtest        | employee2  | email      | 2023-04-29 06:57:17 | n_leaf_pages |         32 |        NULL | Number of leaf pages in the index |
| sbtest        | employee2  | email      | 2023-04-29 06:57:17 | size         |         33 |        NULL | Number of pages in the index      |
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
6 rows in set (0.00 sec)

Test Case 2 – More than 10% data change

Here the stats got updated after 10% or more data change which also matches with the statement in the documentation

mysql> insert into employee2 select * from employee limit 11000,1100;
Query OK, 1100 rows affected (0.03 sec)
Records: 1100 Duplicates: 0 Warnings: 0

+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
| database_name | table_name | index_name | last_update         | stat_name    | stat_value | sample_size | stat_description                  |
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
| sbtest        | employee2  | PRIMARY    | 2023-04-29 07:01:35 | n_diff_pfx01 |      11107 |          20 | id                                |
| sbtest        | employee2  | PRIMARY    | 2023-04-29 07:01:35 | n_leaf_pages |         51 |        NULL | Number of leaf pages in the index |
| sbtest        | employee2  | PRIMARY    | 2023-04-29 07:01:35 | size         |         97 |        NULL | Number of pages in the index      |
| sbtest        | employee2  | email      | 2023-04-29 07:01:35 | n_diff_pfx01 |      11318 |          20 | email                             |
| sbtest        | employee2  | email      | 2023-04-29 07:01:35 | n_leaf_pages |         32 |        NULL | Number of leaf pages in the index |
| sbtest        | employee2  | email      | 2023-04-29 07:01:35 | size         |         33 |        NULL | Number of pages in the index      |
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
6 rows in set (0.00 sec)

Test case 3 – disable auto stats update

We have explicitly disabled auto stats collection for employee2 table and performed 2 major data change operations targeting between 20% – 80% data change

mysql> alter table employee2 STATS_AUTO_RECALC=0;
Query OK, 0 rows affected (0.01 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> insert into employee2 select * from employee limit 16740,2100;
Query OK, 2100 rows affected (0.06 sec)
Records: 2100  Duplicates: 0  Warnings: 0

mysql> insert into employee2 select * from employee limit 18840,10000;
Query OK, 10000 rows affected (0.17 sec)
Records: 10000  Duplicates: 0  Warnings: 0

+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
| database_name | table_name | index_name | last_update         | stat_name    | stat_value | sample_size | stat_description                  |
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
| sbtest        | employee2  | PRIMARY    | 2023-04-29 08:15:24 | n_diff_pfx01 |      15659 |          20 | id                                |
| sbtest        | employee2  | PRIMARY    | 2023-04-29 08:15:24 | n_leaf_pages |         71 |        NULL | Number of leaf pages in the index |
| sbtest        | employee2  | PRIMARY    | 2023-04-29 08:15:24 | size         |         97 |        NULL | Number of pages in the index      |
| sbtest        | employee2  | email      | 2023-04-29 08:15:24 | n_diff_pfx01 |      15377 |          20 | email                             |
| sbtest        | employee2  | email      | 2023-04-29 08:15:24 | n_leaf_pages |         50 |        NULL | Number of leaf pages in the index |
| sbtest        | employee2  | email      | 2023-04-29 08:15:24 | size         |         97 |        NULL | Number of pages in the index      |
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
6 rows in set (0.00 sec)

So here as we expected the stats didn’t get update after disabling auto stats collection at table level

Test case 4 – Analyse table

Now we ran analyse table on employee2 table and this time as expected all the persistent stats and cardinality details of the indexes got updated

mysql> analyze table employee2;
+------------------+---------+----------+----------+
| Table            | Op      | Msg_type | Msg_text |
+------------------+---------+----------+----------+
| sbtest.employee2 | analyze | status   | OK       |
+------------------+---------+----------+----------+
1 row in set (0.01 sec)

+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
| database_name | table_name | index_name | last_update         | stat_name    | stat_value | sample_size | stat_description                  |
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
| sbtest        | employee2  | PRIMARY    | 2023-04-29 08:54:44 | n_diff_pfx01 |      29029 |          20 | id                                |
| sbtest        | employee2  | PRIMARY    | 2023-04-29 08:54:44 | n_leaf_pages |        130 |        NULL | Number of leaf pages in the index |
| sbtest        | employee2  | PRIMARY    | 2023-04-29 08:54:44 | size         |        161 |        NULL | Number of pages in the index      |
| sbtest        | employee2  | email      | 2023-04-29 08:54:44 | n_diff_pfx01 |      29248 |          20 | email                             |
| sbtest        | employee2  | email      | 2023-04-29 08:54:44 | n_leaf_pages |         76 |        NULL | Number of leaf pages in the index |
| sbtest        | employee2  | email      | 2023-04-29 08:54:44 | size         |         97 |        NULL | Number of pages in the index      |
+---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+
6 rows in set (0.00 sec)

+-----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| Table     | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression |
+-----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| employee2 |          0 | PRIMARY  |            1 | id          | A         |       29029 |     NULL |   NULL |      | BTREE      |         |               | YES     | NULL       |
| employee2 |          0 | email    |            1 | email       | A         |       29029 |     NULL |   NULL |      | BTREE      |         |               | YES     | NULL       |
+-----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
2 rows in set (0.00 sec)

What happens if your stats are outdated ? Will it cause any problem ?

The queries would still use the last updated stats of the table however it is recommended to run analyse table regularly to have the stats updated at least during a maintenance window in a controlled environment. The stats being persistent and stored in table allows us to update the stats on the table manually.

What are the variables that control the innodb index statistics ?

Variable_nameDefault ValueRecommended valueDescription
innodb_stats_persistent_sample_pages20100– The number of index pages that are sampled while estimating cardinality and other statistics for an indexed column.
– Increasing the value improves the accuracy of index statistics, which can improve the query execution plan
– Increasing the samples would increase the duration it takes to calculate the stats or by Analyse table.
– This is a table and global level settings
innodb_stats_auto_recalcONON– calculate the statistics whenever more than 10% of the rows have changed.
– This setting applies to all tables created when the innodb_stats_persistent option is enabled which can be controlled at table level or global instance level
– This is a table and global level settings
innodb_stats_include_delete_markedOFF– By default, InnoDB reads uncommitted data when calculating statistics.
– Innodb excludes the deleted rows of uncommitted transaction while calculating row estimates and index statistics which causes non-optimal execution plans for other transactions
– this variable should be enabled to consider uncommitted delete rows for calculating statistics.
– This is only a global settings

Conclusion

Persistent statistics and Auto stats calculation is a very important feature of InnoDB which controls the optimal behavior of the queries. Persistent stats tables mysql.innodb_table_stats and mysql.innodb_index_stats are very important source of information which we often check whenever someone reports on degraded performance or slowness on any particular query or table. Sometimes it is important to monitor stats update on important tables to ensure the tables are having healthy stats and it is not causing any performance bottleneck. Sometimes under default configuration the tables can get outdated or sometimes with the kind of data change the table goes through the cardinality and statistics of the indexes changes which can cause severe performance issues or sometime it can potential outages as well. We were able to evaluate and predict the behaviour of 10% data change on a test environment however in live production environment this can be a very critical unpredictable event.

  • After how much activity 10% data change on a table would happen is an unpredictable event
  • During what time of the day this 10% change would reach is also an unpredictable event
  • the stats update can happen anytime during the peak times of your user activity
  • We cannot change the 10% data change mark
  • 10% data change can be too short or too fast for some applications to undergo this auto stats update
  • We don’t have any time feature to update stats at a specific time per table or at global level other than scheduling some cron.

With these many possible challenges auto stats update can be advantage however it can be a very critical source for nightmare for any DBA. If you are experiencing any degraded performance issues or you not sure if it is due to potential stats update or something else do reach out to us .Our team would be more than happy to assist you in securing and improving your database infrastructure footprint. Hope you find this information useful. Keep reading and follow us for more exclusive content for open source databases.

%d bloggers like this: