When MySQL query optimization is mentioned, indexes are one of the first things that get covered. Today, we will try to see why they are so important.
What are Indexes?
In general, an index is an alphabetical list of records with references to the pages on which they are mentioned. In MySQL, an index is a data structure used to quickly find rows. Indexes are also called keys and those keys are critical for good performance – as the data grows larger, the need of using indexes properly might become more and more important. Using indexes is one of the most powerful ways to improve query performance – if indexes are used properly, query performance might increase by tens or even hundreds of times.
Today, we will try to explain the basic benefits and drawbacks of using indexes in MySQL. Keep in mind that MySQL indexes alone deserve an entire book so this post will not cover absolutely everything, but it will be a good starting point. For those who are interested in how indexes work on a deeper level, reading the book Relational Database Index Design and the Optimizers by Tapio Lahdenmäki and Michael Leach should provide more insight.
The Benefits of Using Indexes
There are a few main benefits of using indexes in MySQL and these are as follows:
- Indexes allow to quickly find rows matching a WHERE clause;
- Indexes might help queries avoid searching through certain rows thus reducing the amount of data the server needs to examine – if there is a choice between multiple indexes, MySQL usually uses the most selective index, that is such an index that finds the smallest amount of rows;
- Indexes might be used in order to retrieve rows from other tables in JOIN operations;
- Indexes might be used to find the minimum or the maximum value of a specific column that uses an index;
- Indexes might be used to sort or group a table if the operations are performed on a leftmost prefix of an index – similarly, a leftmost prefix of a multiple-column index might also be used by the query optimizer to look up rows;
- Indexes might also be used to save disk I/O – when a covering index is in use, a query can return values straight from the index structure saving disk I/O.
Similarly, there are multiple types of indexes:
- INDEX is a type of index where values do not need to be unique. This type of index accepts NULL values;
- UNIQUE INDEX is frequently used to remove duplicate rows from a table – this type of index allows developers to enforce the uniqueness of row values;
- FULLTEXT INDEX is an index that is applied on fields that utilize full text search capabilities. This type of index finds keywords in the text instead of directly comparing values to the values in the index;
- DESCENDING INDEX is an index that stores rows in a descending order – the query optimizer will choose this type of an index when a descending order is requested by the query. This index type was introduced in MySQL 8.0;
- PRIMARY KEY is also an index. In a nutshell, the PRIMARY KEY is a column or a set of columns that identifies each row in a table – frequently used together with fields having an AUTO_INCREMENT attribute. This type of index does not accept NULL values and once set, the values in the PRIMARY KEY cannot be changed.
Now, we will try to go through both the benefits and the drawbacks of using indexes in MySQL. We will start with the probably most frequently discussed upside – speeding up queries that match a WHERE clause.
Speeding up Queries Matching a WHERE Clause
Indexes are frequently used to speed up search queries that match a WHERE clause. The reason why an index makes such search operations faster is pretty simple – queries that use an index avoid a full table scan.
In order to speed up queries that match a WHERE clause you can make use of the EXPLAIN statement in MySQL. The statement EXPLAIN SELECT should provide you some insight about how the MySQL query optimizer executes the query – it can also show you whether the query in question uses an index or not and what index does it use. Take a look at the following query explanation:
mysql> EXPLAIN SELECT * FROM demo_table WHERE field_1 = “Demo” G; *************************** 1. row *************************** <...> possible_keys: NULL key: NULL key_len: NULL <...>
The above query does not use an index. However, if we add an index on “field_1”, the index would be used successfully:
mysql> EXPLAIN SELECT * FROM demo_table WHERE field_1 = “Demo” G; *************************** 1. row *************************** <...> possible_keys: field_1 key: field_1 key_len: 43 <...>
The possible_keys column describes the possible indexes that MySQL can choose, the key column describes the index actually chosen and the key_len column describes the length of the chosen key.
In this case, MySQL would perform a lookup of the values in the index and return any rows containing the specified value – as a result, the query would be faster. Although indexes do help certain queries to be faster, there are a couple of things that you need to keep in mind if you want your indexes to help your queries:
- Isolate your columns – MySQL cannot use indexes if the columns the indexes are used on are not isolated. For example, a query like this wouldn’t use an index:
SELECT field_1 FROM demo_table WHERE field_1 + 5 = 10;
In order to solve this, leave the column that goes after the WHERE clause alone – simplify your query as much as possible and isolate the columns;
- Avoid using LIKE queries with a preceding wildcard – in this case, MySQL will not use an index because the preceding wildcard means that there can be anything before the text. If you must use LIKE queries with wildcards and want the queries to make use of indexes, make sure that the wildcard is at the end of the search statement.
Of course, speeding up queries that match a WHERE clause can also be done in other ways (for example, partitioning), but for the sake of simplicity, we won’t be looking further into that in this post.
What we might be interested in however are different kinds of index types, so we’ll look into that now.
Getting rid of Duplicate Values in a Column – UNIQUE Indexes
The purpose of a UNIQUE index in MySQL is to enforce the uniqueness of the values in a column. To use a UNIQUE index run a CREATE UNIQUE INDEX query:
CREATE UNIQUE INDEX demo_index ON demo_table(demo_column); You can also create a unique index when you create a table: CREATE TABLE demo_table ( `demo_column` VARCHAR(100) NOT NULL, UNIQUE KEY(demo_column) );
That’s all it takes to add a unique index to a table. Now, when you try to add a duplicate value to the table MySQL will come back with the following error:
#1062 - Duplicate entry ‘Demo’ for key ‘demo_column’
A FULLTEXT index is such an index that is applied to the columns that use full text search capabilities. This type of index has many unique capabilities including stopwords and search modes.
The InnoDB stopword list has 36 words while the MyISAM stopword list has 143. In InnoDB, the stopwords are derived from the table set in the variable innodb_ft_user_stopword_table, otherwise, if this variable is not set they are derived from the innodb_ft_server_stopword_table variable. If neither of those two variables are set, InnoDB uses the built-in list. To see the default InnoDB stopword list, query the INNODB_FT_DEFAULT_STOPWORD table.
In MyISAM, the stopwords are derived from the storage/myisam/ft_static.c file. The ft_stopword_file variable enables the default stopword list to be changed. Stopwords will be disabled if this variable is set to an empty string, but keep in mind that if this variable defines a file, the defined file is not parsed for comments – MyISAM will treat all of the words found in the file as stopwords.
The FULLTEXT indexes are also famous for its unique search modes:
- If a FULLTEXT search query with no modifiers is run, a natural language mode will be activated. The natural language mode can also be activated by using the IN NATURAL LANGUAGE MODE modifier;
- The WITH QUERY EXPANSION modifier enables a search mode with query expansion. Such a search mode works by performing the search twice and when the search is run for the second time, the result set would include a few of the most relevant documents from the first search. In general, this modifier is useful when the user has some implied knowledge (for example, the user might search for “database” and hope to see “InnoDB” and “MyISAM” in the result set);
- The IN BOOLEAN MODE modifier allows searching with boolean operators. For example, the +, – or * operators would each accomplish different tasks – the + operator would define that the value must be present in a row, the – operator would define that the value must not exist and the * operator would act as a wildcard.
A query that uses a FULLTEXT index looks like so:
SELECT * FROM demo_table WHERE MATCH(demo_field) AGAINST(‘value’ IN NATURAL LANGUAGE MODE);
Keep in mind that FULLTEXT indexes are generally useful for MATCH() AGAINST() operations – not for WHERE operations meaning that if a WHERE clause would be used, the usefulness of using different index types would not be eliminated.
It is also worth mentioning that FULLTEXT indexes have a minimum length of characters. In InnoDB, a FULLTEXT search can only be performed when the search query consists of a minimum of three characters – this limit is increased to four characters in the MyISAM storage engine.
A DESCENDING index is such an index where InnoDB stores the entries in a descending order – the query optimizer will use such an index when a descending order is requested by the query. Such an index can be added to a column by running a query like below:
CREATE INDEX descending_index ON demo_table(column_name DESC);
An ascending index can also be added to a column – just replace DESC with ASC.
A PRIMARY KEY serves as an unique identifier for each row in a table. A column with a PRIMARY KEY must contain unique values – no NULL values are allowed to be used either. If a duplicate value is added to a column which has a PRIMARY KEY, MySQL will respond with an error #1062:
#1062 - Duplicate entry ‘Demo’ for key ‘PRIMARY’
If a NULL value is added to the column, MySQL will respond with an error #1048:
#1048 - Column ‘id’ cannot be null
Primary indexes are also sometimes called clustered indexes (we discuss them later).
You can also create indexes on multiple columns at once – such indexes are called multicolumn indexes.
Indexes on multiple columns are often misunderstood – sometimes developers and DBAs index all of the columns separately or index them in the wrong order. In order to make queries utilizing multicolumn indexes as effective as possible, remember that the order of columns in indexes that use more than one column is one of the most common causes of confusion in this space – as there are no “this way or the highway” index order solutions, you must remember that the correct order of multicolumn indexes does depend on the queries that are using the index. While this may seem pretty obvious, do remember that the column order is vital when dealing with multicolumn indexes – choose the column order such that it’s as selective as possible for the queries that will run the most frequently.
In order to measure the selectivity for specific columns, get the ratio of the number of distinct indexed values to the total number of rows in the table – the column that has the higher selectivity should be the first one.
Sometimes you also need to index very long character columns, and in that case, you can often save time and resources by indexing the first few characters – a prefix – instead of the whole value.
Prefix indexes can be useful when the columns contain very long string values, which would mean that adding an index on the whole column would consume a lot of disk space. MySQL helps to address this issue by allowing you to only index a prefix of the value which in turn makes the index size smaller. Take a look:
CREATE TABLE `demo_table` ( `demo_column` VARCHAR(100) NOT NULL, INDEX(demo_column(10)) );
The above query would create a prefix index on the demo column only indexing the first 10 characters of the value. You can also add a prefix index to an existing table:
CREATE INDEX index_name ON table_name(column_name(length));
So, for example, if you would want to index the first 5 characters of a demo_column on a demo_table, you could run the following query:
CREATE INDEX demo_index ON demo_table(demo_column(5));
You should choose a prefix that is long enough to give selectivity, but also short enough to give space. This might be easier said than done though – you need to experiment and find the solution that works for you.
A covering index “covers” all of the required fields to execute a query. In other words, when all fields in a query are covered by an index, a covering index is in use. For example for a query like so:
SELECT id, title FROM demo_table WHERE id = 1;
A covering index might look like this:
INDEX index_name(id, title);
If you want to make sure that a query uses a covering index, issue an EXPLAIN statement on it, then take a look at the Extra column. For example, if your table has a multicolumn index on id and title and a query that accesses only these two columns is executed, MySQL will use the index:
mysql> EXPLAIN SELECT id, title FROM demo_table G; *************************** 1. row *************************** <...> type: index key: index_name key_len: 5 rows: 1000 Extra: Using index <...>
Keep in mind that a covering index must store the values from the columns it covers. That means that MySQL can only use B-Tree indexes to cover queries because other kinds of indexes do not store these values.
Clustered, Secondary Indexes, and Index Cardinality
When indexes are discussed, you might also hear the terms clustered, secondary indexes, and index cardinality. Put simply, clustered indexes are an approach to data storage and all indexes other than clustered indexes are secondary indexes. Index cardinality on the other hand is the number of unique values in an index.
A clustered index speeds up queries because close values are also stored close to each other on the disk, but that’s also the reason why you can only have one clustered index in a table.
A secondary index is any index that isn’t the primary index. Such an index may have duplicates.
The Drawbacks of Using Indexes
The usage of indexes certainly has upsides, but we mustn’t forget that indexes can be one of the leading causes of issues in MySQL too. Some of the drawbacks of using indexes are as follows:
- Indexes can degrade the performance of certain queries – even though indexes tend to speed up the performance of SELECT queries, they slow down the performance of INSERT, UPDATE, and DELETE queries because when the data is updated the index also needs to be updated together with it: any operation that involves manipulating the indexes will be slower than usual;
- Indexes consume disk space – an index occupies its own space, so indexed data will consume more disk space too;
- Redundant and duplicate indexes can be a problem – MySQL allows you to create duplicate indexes on a column and it does not “protect you” from doing such a mistake. Take a look at this example:
CREATE TABLE `demo_table` ( `id` INT(10) NOT NULL AUTO_INCREMENT PRIMARY KEY, `column_2` VARCHAR(10) NOT NULL, `column_3` VARCHAR(10) NOT NULL, INDEX(id), UNIQUE(id) );
An inexperienced user might think that this query makes the id column increment automatically, then adds an index on the column and makes the column not accept duplicate values. However, this isn’t what’s happening here. In this case, the same column has three indexes on it: an ordinary INDEX, and since MySQL implements both PRIMARY KEY and UNIQUE constraints with indexes, that adds two more indexes on the same column!
To conclude, indexes in MySQL have their own place – indexes can be used in a multitude of scenarios, but each of those usage scenarios have their own downsides which must be considered in order to get the most of indexes that are in use.
To use indexes well, profile your queries, take a look at what options you have when it comes to indexes, know their benefits and disadvantages, decide what indexes you need based on your requirements and after you index the columns, make sure your indexes are actually used by MySQL. If you have indexed your schema properly, the performance of your queries should improve, but if the response time doesn’t satisfy you, see if a better index can be created in order to improve it.