blog

Maximizing Database Query Efficiency for MySQL – Part One

Paul Namuag

Published

Slow queries, inefficient queries, or long running queries are problems that regularly plague DBA’s. They are always ubiquitous, yet are an inevitable part of life for anyone responsible for managing a database. 

Poor database design can affect the efficiency of the query and its performance. Lack of knowledge or improper use of function calls, stored procedures, or routines can also cause database performance degradation and can even harm the entire MySQL database cluster.

For a master-slave replication, a very common cause of these issues are tables which lack primary or secondary indexes. This causes slave lag which can last for a very long time (in a worse case scenario).

In this two-part series blog, we’ll give you a refresher course on how to tackle the maximizing of your database queries in MySQL to driver better efficiency and performance.

Always Add a Unique Index To Your Table

Tables that do not have primary or unique keys typically create huge problems when data gets bigger. When this happens a simple data modification can stall the database. Lack of proper indices and an UPDATE or DELETE statement has been applied to the particular table, a full table scan will be chosen as the query plan by MySQL. That can cause high disk I/O for reads and writes and degrades the performance of your database. See an example below:

root[test]> show create table sbtest2G

*************************** 1. row ***************************

       Table: sbtest2

Create Table: CREATE TABLE `sbtest2` (

  `id` int(10) unsigned NOT NULL,

  `k` int(10) unsigned NOT NULL DEFAULT '0',

  `c` char(120) NOT NULL DEFAULT '',

  `pad` char(60) NOT NULL DEFAULT ''

) ENGINE=InnoDB DEFAULT CHARSET=latin1

1 row in set (0.00 sec)



root[test]> explain extended update sbtest2 set k=52, pad="xx234xh1jdkHdj234" where id=57;

+----+-------------+---------+------------+------+---------------+------+---------+------+---------+----------+-------------+

| id | select_type | table   | partitions | type | possible_keys | key  | key_len | ref | rows | filtered | Extra       |

+----+-------------+---------+------------+------+---------------+------+---------+------+---------+----------+-------------+

|  1 | UPDATE      | sbtest2 | NULL       | ALL | NULL | NULL | NULL    | NULL | 1923216 | 100.00 | Using where |

+----+-------------+---------+------------+------+---------------+------+---------+------+---------+----------+-------------+

1 row in set, 1 warning (0.06 sec)

Whereas a table with primary key has a very good query plan,

root[test]> show create table sbtest3G

*************************** 1. row ***************************

       Table: sbtest3

Create Table: CREATE TABLE `sbtest3` (

  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,

  `k` int(10) unsigned NOT NULL DEFAULT '0',

  `c` char(120) NOT NULL DEFAULT '',

  `pad` char(60) NOT NULL DEFAULT '',

  PRIMARY KEY (`id`),

  KEY `k` (`k`)

) ENGINE=InnoDB AUTO_INCREMENT=2097121 DEFAULT CHARSET=latin1

1 row in set (0.00 sec)



root[test]> explain extended update sbtest3 set k=52, pad="xx234xh1jdkHdj234" where id=57;

+----+-------------+---------+------------+-------+---------------+---------+---------+-------+------+----------+-------------+

| id | select_type | table   | partitions | type | possible_keys | key     | key_len | ref | rows | filtered | Extra   |

+----+-------------+---------+------------+-------+---------------+---------+---------+-------+------+----------+-------------+

|  1 | UPDATE      | sbtest3 | NULL       | range | PRIMARY | PRIMARY | 4       | const | 1 | 100.00 | Using where |

+----+-------------+---------+------------+-------+---------------+---------+---------+-------+------+----------+-------------+

1 row in set, 1 warning (0.00 sec)

Primary or unique keys provides vital component for a table structure because this is very important especially when performing maintenance on a table. For example, using tools from the Percona Toolkit (such as pt-online-schema-change or pt-table-sync) recommends that you must have unique keys. Keep in mind that the PRIMARY KEY is already a unique key and a primary key cannot hold NULL values but unique key. Assigning a NULL value to a Primary Key can cause an error like,

ERROR 1171 (42000): All parts of a PRIMARY KEY must be NOT NULL; if you need NULL in a key, use UNIQUE instead

For slave nodes, it is also common that in certain occasions, the primary/unique key is not present on the table which therefore are discrepancy of the table structure. You can use mysqldiff to achieve this or you can mysqldump –no-data … params and and run a diff to compare its table structure and check if there’s any discrepancy.

Scan Tables With Duplicate Indexes, Then Dropped It

Duplicate indices can also cause performance degradation, especially when the table contains a huge number of records. MySQL has to perform multiple attempts to optimize the query and performs more query plans to check. It includes scanning large index distribution or statistics and that adds performance overhead as it can cause memory contention or high I/O memory utilization.

Degradation for queries when duplicate indices are observed on a table also attributes on saturating the buffer pool. This can also affect the performance of MySQL when the checkpointing flushes the transaction logs into the disk. This is due to the processing and storing of an unwanted index (which is in fact a waste of space in the particular tablespace of that table). Take note that duplicate indices are also stored in the tablespace which also has to be stored in the buffer pool.

Take a look at the table below which contains multiple duplicate keys:

root[test]#> show create table sbtest3G

*************************** 1. row ***************************

       Table: sbtest3

Create Table: CREATE TABLE `sbtest3` (

  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,

  `k` int(10) unsigned NOT NULL DEFAULT '0',

  `c` char(120) NOT NULL DEFAULT '',

  `pad` char(60) NOT NULL DEFAULT '',

  PRIMARY KEY (`id`),

  KEY `k` (`k`,`pad`,`c`),

  KEY `kcp2` (`id`,`k`,`c`,`pad`),

  KEY `kcp` (`k`,`c`,`pad`),

  KEY `pck` (`pad`,`c`,`id`,`k`)

) ENGINE=InnoDB AUTO_INCREMENT=2048561 DEFAULT CHARSET=latin1

1 row in set (0.00 sec)

and has a size of 2.3GiB

root[test]#> ! du -hs /var/lib/mysql/test/sbtest3.ibd

2.3G    /var/lib/mysql/test/sbtest3.ibd

Let’s drop the duplicate indices and rebuild the table with a no-op alter,

root[test]#> drop index kcp2 on sbtest3; drop index kcp on sbtest3 drop index pck on sbtest3;

Query OK, 0 rows affected (0.01 sec)

Records: 0  Duplicates: 0  Warnings: 0

Query OK, 0 rows affected (0.01 sec)

Records: 0  Duplicates: 0  Warnings: 0

Query OK, 0 rows affected (0.01 sec)

Records: 0  Duplicates: 0  Warnings: 0



root[test]#> alter table sbtest3 engine=innodb;

Query OK, 0 rows affected (28.23 sec)

Records: 0  Duplicates: 0  Warnings: 0



root[test]#> ! du -hs /var/lib/mysql/test/sbtest3.ibd

945M    /var/lib/mysql/test/sbtest3.ibd

It has been able to save up to ~59% of the old size of the table space which is really huge.

To determine duplicate indexes, you can use pt-duplicate-checker to handle the job for you.

Tune Up your Buffer Pool

For this section I’m referring only to the InnoDB storage engine. 

Buffer pool is an important component within the InnoDB kernel space. This is where InnoDB caches table and index data when accessed. It speeds up processing because frequently used data are being stored in the memory efficiently using BTREE. For instance, If you have multiple tables consisting of >= 100GiB and are accessed heavily, then we suggest that you delegate a fast volatile memory starting from a size of 128GiB and start assigning the buffer pool with 80% of the physical memory. The 80% has to be monitored efficiently. You can use SHOW ENGINE INNODB STATUS G or you can leverage monitoring software such as ClusterControl which offers a fine-grained monitoring which includes buffer pool and its relevant health metrics. Also set the innodb_buffer_pool_instances variable accordingly. You might set this larger than 8 (default if innodb_buffer_pool_size >= 1GiB), such as 16, 24, 32, or 64 or higher if necessary.

When monitoring the buffer pool, you need to check global status variable Innodb_buffer_pool_pages_free which provides you thoughts if there’s a need to adjust the buffer pool, or maybe consider if there are also unwanted or duplicate indexes that consumes the buffer. The SHOW ENGINE INNODB STATUS G also offers a more detailed aspect of the buffer pool information including its individual buffer pool based on the number of innodb_buffer_pool_instances you have set.

Use FULLTEXT Indexes (But Only If Applicable)

Using queries like,

SELECT bookid, page, context FROM books WHERE context like '%for dummies%';

wherein context is a string-type (char, varchar, text) column, is an example of a super bad query! Pulling large content of records with a filter that has to be greedy ends up with a full table scan, and that is just crazy. Consider using FULLTEXT index. A FULLTEXT indexes have an inverted index design. Inverted indexes store a list of words, and for each word, a list of documents that the word appears in. To support proximity search, position information for each word is also stored, as a byte offset.

In order to use FULLTEXT for searching or filtering data, you need to use the combination of MATCH() …AGAINST syntax and not like the query above. Of course, you need to specify the field to be your FULLTEXT index field. 

To create a FULLTEXT index, just specify with FULLTEXT as your index. See the example below:

root[minime]#> CREATE FULLTEXT INDEX aboutme_fts ON users_info(aboutme);

Query OK, 0 rows affected, 1 warning (0.49 sec)

Records: 0  Duplicates: 0  Warnings: 1



root[jbmrcd_date]#> show warnings;

+---------+------+--------------------------------------------------+

| Level   | Code | Message                                          |

+---------+------+--------------------------------------------------+

| Warning |  124 | InnoDB rebuilding table to add column FTS_DOC_ID |

+---------+------+--------------------------------------------------+

1 row in set (0.00 sec)

Although using FULLTEXT indexes can offer benefits when searching words within a very large context inside a column, it also creates issues when used incorrectly.

When doing a FULLTEXT search for a large table that is constantly accessed (where a number of client requests are searching for different,  unique keywords) it could be very CPU intensive.

There are certain occasions as well that FULLTEXT is not applicable. See this external blog post. Although I haven’t tried this with 8.0, I don’t see any changes relevant to this. We suggest that do not use FULLTEXT for searching a big data environment, especially for high-traffic tables. Otherwise, try to leverage other technologies such as Apache Lucene, Apache Solr, tsearch2, or Sphinx.

Avoid Using NULL in Columns

Columns that contain null values are totally fine in MySQL. But if you are using columns with null values into an index, it can affect query performance as the optimizer cannot provide the right query plan due to poor index distribution. However, there are certain ways to optimize queries that involves null values but of course, if this suits the requirements. Please check the documentation of MySQL about Null Optimization. You may also check this external post which is helpful as well.

Design Your Schema Topology and Tables Structure Efficiently

To some extent, normalizing your database tables from 1NF (First Normal Form) to 3NF (Third Normal Form) provides you some benefit for query efficiency because normalized tables tend to avoid redundant records. A proper planning and design for your tables is very important because this is how you retrieved or pull data and in every one of these actions has a cost. With normalized tables, the goal of the database is to ensure that every non-key column in every table is directly dependent on the key; the whole key and nothing but the key. If this goal is reached, it pays of the benefits in the form of reduced redundancies, fewer anomalies and improved efficiencies.

While normalizing your tables has many benefits, it doesn’t mean you need to normalize all your tables in this way. You can implement a design for your database using Star Schema. Designing your tables using Star Schema has the benefit of simpler queries (avoid complex cross joins), easy to retrieve data for reporting, offers performance gains because there’s no need to use unions or complex joins, or fast aggregations. A Star Schema is simple to implement, but you need to carefully plan because it can create big problems and disadvantages when your table gets bigger and requires maintenance. Star Schema (and its underlying tables) are prone to data integrity issues, so you may have a high probability that bunch of your data is redundant. If you think this table has to be constant (structure and design) and is designed to utilize query efficiency, then it’s an ideal case for this approach.

Mixing your database designs (as long as you are able to determine and identify what kind of data has to be pulled on your tables) is very important since you can benefit with more efficient queries and as well as help the DBA with backups, maintenance, and recovery.

Get Rid of Constant and Old Data

We recently wrote some Best Practices for Archiving Your Database in the Cloud. It covers about how you can take advantage of data archiving before it goes to the cloud. So how does getting rid of old data or archiving your constant and old data help query efficiency? As stated in my previous blog, there are benefits for larger tables that are constantly modified and inserted with new data, the tablespace can grow quickly. MySQL and InnoDB performs efficiently when records or data are contiguous to each other and has significance to its next row in the table. Meaning, if you have no old records that are no longer need to be used, then the optimizer does not need to include that in the statistics offering much more efficient result. Make sense, right? And also, query efficiency is not only on the application side, it has also need to consider its efficiency when performing a backup and when on maintenance or failover. For example, if you have a bad and long query that can affect your maintenance period or a failover, that can be a problem.

Enable Query Logging As Needed

Always set your MySQL’s slow query log in accordance to your custom needs. If you are using Percona Server, you can take advantage of their extended slow query logging. It allows you to customarily define certain variables. You can filter types of queries in combination such as full_scan, full_join, tmp_table, etc. You can also dictate the rate of slow query logging through variable log_slow_rate_type, and many others.

The importance of enabling query logging in MySQL (such as slow query) is beneficial for inspecting your queries so that you can optimize or tune your MySQL by adjusting certain variables that suits to your requirements. To enable slow query log, ensure that these variables are setup:

  • long_query_time – assign the right value for how long the queries can take. If the queries take more than 10 seconds (default), it will fall down to the slow query log file you assigned.
  • slow_query_log – to enable it, set it to 1.
  • slow_query_log_file – this is the destination path for your slow query log file.

The slow query log is very helpful for query analysis and diagnosing bad queries that cause stalls, slave delays, long running queries, memory or CPU intensive, or even cause the server to crash. If you use pt-query-digest or pt-index-usage, use the slow query log file as your source target for reporting these queries alike.

Conclusion

We have discussed some ways you can use to maximize database query efficiency in this blog. In this next part we’ll discuss even more factors which can help you maximize performance. Stay tuned!

Subscribe below to be notified of fresh posts