A database server needs CPU, memory, disk and network in order to function. Understanding these resources is important for a DBA, as any resource that is weak or overloaded can become a limiting factor and cause the database server to perform poorly. A main task of the DBA is to tune operating system and database configurations and avoid overutilization or underutilization of the available resources.
In this blog post, we’ll discuss some of the settings that are most often tweaked and which can bring you significant improvement in the performance. We will also cover some of the variables which are frequently modified even though they should not. Performance tuning is not easy, but you can go a surprisingly long way with a few basic guidelines.
This is the eighth installment in the ‘Become a MySQL DBA’ blog series. Our previous posts in the DBA series include Live Migration using MySQL Replication, Database Upgrades, Replication Topology Changes, Schema Changes, High Availability, Backup & Restore, Monitoring & Trending.
Performance tuning - a continuous process
Installing MySQL is usually the first step in the process of tuning both OS and database configurations. This is a never-ending story as a database is a dynamic system. Your MySQL database can be CPU-bound at first, as you have plenty of memory and little data. With time, though, it may change and disk access may become more frequent. As you can imagine, the configuration of a server where I/O is the main concern will look different to that of a server where all data fits in memory. Additionally, your query mix may also change in time and as such, access patterns or utilization of the features available in MySQL (like adaptive hash index), can change with it.
What’s also important to keep in mind is that, most of the time, tweaks in MySQL configuration will not give you significant difference in performance. There are couple of exceptions but you should not expect anything like 10x improvement or something similar. Adding a correct index may help you much more than tweaking your my.cnf.
The tuning process
Let’s start with a description of the tuning process.
To begin, you would need a deterministic environment to test your changes and observe results. The environment should be as close to production as possible. By that we mean both data and traffic. For safety reasons you should not implement and test changes directly on the production systems. It’s also much easier to make changes in the testing environment - some of the tweaks require MySQL to restart - this is not something you can do on production.
Another thing to keep in mind - when you make changes, it is very easy to lose track of which change affected your workload in a particular way. People tend to take shortcuts and make multiple tweaks at the same time - it’s not the best way. After implementing multiple changes at the same time, you do not really know what impact each of them had. The result of all the changes is known but it’s not unlikely that you’d be better off implementing only one of the five changes you made.
After each config change, you also want to ensure your system is in the same state - restore the data set to a known (and always the same) position - e.g., you can restore your data from a given backup. Then you need to run exactly the same query mix to reproduce the production workload - this is the only way to ensure that your results are meaningful and that you can reproduce them. What’s also important is to isolate the testing environment from the rest of your infrastructure so your results won’t be affected by external factors. This means that you do not want to use VMs located on a shared host as other VMs may impact your tests. The same is true for storage - shared SAN may cause some unexpected results.
You’ll want to check operating system settings related to the way the memory and filesystem cache are handled. In general, we want to keep both vm.dirty_ratio and vm.dirty_background_ratio low.
vm.dirty_background_ratio is the percentage of system memory that can be used to cache modified (“dirty”) pages before the background flush process kicks in. More of them means more work needs to be done in order to clean the cache.
vm.dirty_ratio, on the other hand, is a hard limit of the memory that can be used to cache dirty pages. It can be reached if, due to high write activity, the background process cannot flush data fast enough to keep up with new modifications. Once vm.dirty_ratio is reached, all I/O activity is locked until dirty pages have been written to disk. Default setting here is usually 40% (it may be different in your distribution), which is pretty high for any host with large memory. Let’s say that for a 128GB instance, it amounts to ~51GB which may lock your I/O for a significant amount of time, even if you are using fast SSD’s.
In general, we want to see both of those variables set to low numbers, 5 - 10%, as we want background flushing to kick in early on and to keep any stalls as short as possible.
Another important system variable to tune is vm.swappiness. When using MySQL we do not want to use swap unless in dire need - swapping out InnoDB buffer pool to disk removes a point of having an in-memory buffer pool. On the other hand, if the alternative is to start OOM and kill MySQL, we’d prefer not to do that. Historically, such behavior could have been achieved by setting vm.swappiness to 0. Since kernel 3.5-rc1 (and this change has been backported to older kernels in some of the distros - CentOS for example), this behavior has changed and setting it to 0 prevents swapping. Therefore it’s recommended to set vm.swappiness to 1, to allow some of the swapping to happen should it be the only option to keep MySQL up. Sure, it will slow down the system but OOM on MySQL is very harsh. It may result in data loss (if you do not run with full durability settings) or, in the best case scenario, trigger InnoDB recovery, a process which may take some time to complete.
Another memory-related setting - ensure that you have NUMA interleave set to all. You can do it by modifying the startup script to start MySQL via:
numactl --interleave=all $command
This setting balances memory allocation between NUMA nodes and minimizes chances that one of the nodes go out of memory.
Memory allocators can also have a significant impact on MySQL performance. This is a larger topic and we’ll only scratch the surface here. You can choose different memory allocators to use with MySQL. Their performance differ between the versions and between workloads so the exact choice should be made only after you performed detailed tests to confirm which one works best in your environment. Most common choices you’ll be looking into are default glibc malloc, tcmalloc and jemalloc. You can add new allocators by installing a new package (for jemalloc and tcmalloc) and then use either LD_PRELOAD (i.e. export LD_PRELOAD="/usr/lib/libtcmalloc_minimal.so.4.1.2") or malloc-lib variable in [mysqld_safe] section of my.cnf.
Next, you’d want to take a look at disk schedulers. CFQ, which is usually the default one, is tuned for a desktop workload. It doesn’t work well for a database workload. Most of the time you’ll see better results if you change it to noop or deadline. There’s little difference between those two schedulers, we found that noop is slightly better for storage based on SAN (SAN usually is better in handling the workload as it knows more about the underlying hardware and what’s actually stored in its cache as compared to the operating system). Differences are minimal, though, and most of the time you won’t go wrong by using any of those options. Again, testing may help you squeeze a bit more from your system.
If we are talking about disks, most often the best choice for filesystem will be either EXT4 or XFS - this has changed a couple of times in the past, and if you’d like to get the most of your I/O subsystem, you’d probably have to do some testing on your setup. No matter which filesystem you use though, you should disable noatime and nodiratime for the MySQL volume - the less writes to the metadata, the lower the overall overhead.
MySQL configuration tuning
MySQL configuration tuning is a topic for a whole book, it’s not possible to cover it in a single blog post. We’ll try to mention some of the more important variables here.
InnoDB Buffer Pool
Let’s start with something rather obvious - InnoDB buffer pool. We still see, from time to time (although it becomes less and less frequent, which is really nice), that it’s not setup correctly. Defaults are way too conservative. What is the buffer pool and why is it so important? The buffer pool is memory used by InnoDB to cache data. It is used for caching both reads and writes - every page that has been modified, had to be loaded first to the buffer pool. It then becomes a dirty page - a page that has been modified and is not yet flushed to the tablespace. As you can imagine, such buffer is really important for a database to perform correctly. The worse the “memory/disk” ratio is, the more I/O bound your workload will be. I/O bound workloads tend to be slow.
You may have heard the rule of thumb to set the InnoDB buffer pool to 80% of the total memory in the system. It worked in times when 8GB was a huge amount of memory, but that is not true nowadays. When calculating the InnoDB buffer pool size, you need to take into consideration memory requirements of the rest of MySQL (assuming that MySQL is the only application running on the server). We are talking here, for example, about all those per-connection or even per-query buffers like join buffer or in-memory temporary table max size. You also need to take under consideration maximum allowed connections - more connections means more memory usage.
For a MySQL database server with 24 to 32 cores and 128GB memory, handling up to 20 - 30 of simultaneous running connections and up to a few hundreds of simultaneously connected clients, we’d say that 10 - 15GB of memory should be enough. If you want to stay on the safe side, 20GB should be plenty. In general, unless you know the behaviour of your database, it’s somewhat a process of trial and error to set up ideal buffer pool size. At the moment of writing, InnoDB buffer pool is not a dynamic variable so changes require restart. Therefore it is safer to err on the side of “too small”. It will change with MySQL 5.7 as Oracle introduced dynamically allocated buffer pool, something which will make tuning much easier.
MySQL uses many buffers other than the InnoDB buffer pool - they are controlled by variables: join_buffer_size, sort_buffer_size, read_buffer_size, read_rnd_buffer_size. These buffers are allocated per-session (with an exception of the join buffer, which is allocated per JOIN). We’ve seen MySQL with those buffers set to hundreds of megabytes - it’s somewhat natural that by increasing join_buffer_size, you’d expect your JOINs to perform faster.
By default those variables have rather small values and it actually makes sense - we’ve seen that low settings, up to 256K, can be significantly faster than larger values like for example 4M. It is hard to tell the exact reason for this behavior, most likely there are many of them. One, definitely, is the fact that Linux changes the way memory is allocated. Up to 256KB it uses malloc(). For larger chunks of memory - mmap(). What’s important to remember is that when it comes to those variables, any change has to be backed by benchmarks that confirm the new setting is indeed the correct one. Otherwise you may be reducing your performance instead of increasing it.
Another variable that has a significant impact on the MySQL performance is innodb_flush_log_at_trx_commit. It’s governing to what extend InnoDB is durable. Defaults (1) ensure your data is safe even if the database server gets killed - under any circumstances there’ll be no data loss. Other settings (2 and 0) say that you may lose up to 1s of transactions if the whole database server crashes (2) and that you may lose up to 1s of transactions if the mysqld gets killed.
Full durability is obviously a great thing to have but it comes at a significant price - I/O load is much higher because the flush operation has to happen after each commit. Therefore, under some circumstances, it’s very popular to reduce durability and accept the risk of data loss in certain conditions. It’s true for master - multiple slaves setups where, usually, it’s perfectly fine to have one slave in the rebuild process after a crash because the rest of them can easily handle the workload. Same is true for Galera clusters - the whole cluster works as a single instance so even if one node crashes and somehow loses its data, it still can resync from another node in the cluster - it’s not worth paying the high price for full durability (especially that writes in Galera are already more expensive than in regular MySQL) when you can easily recover from such situations.
Other variables which may have significant impact on some workloads are innodb_io_capacity, innodb_io_capacity_max and innodb_lru_scan_depth. Those variables define the number of disk operations that can be done by InnoDB’s background threads to, e.g., flush dirty pages from the InnoDB buffer pool. Default settings are conservative, which is fine most of the time. If your workload is very write-intensive, you may want to tune those settings and see if you are not preventing InnoDB from using your I/O subsystem fully. This is especially true if you have fast storage: SSD or PCIe SSD card.
When it comes to disks, innodb_flush_method is another setting that you may want to look at. We’ve seen visible performance gains by switching this setting from default fdatasync to O_DIRECT. Such gain is clearly visible on setups with hardware RAID controller which is backed up by the BBU. On the other hand, when it comes to EBS volumes, we’ve seen better results using O_DSYNC. Benchmarking here is very important to understand which setting would be better in your particular case.
InnoDB Redo Logs
The size of InnoDB’s redo logs is also something you may want to take a look at. It is governed by innodb_log_file_size and innodb_log_files_in_group. By default we have two logs in a group, each ~50MB in size. Those logs are used to store write transactions and they are written sequentially. The main problem here is that MySQL must not run out of space in the logs and if logs are almost full, it will have to stop the whole activity and focus on flushing the data to the tablespaces. Of course, this is very bad for the application as no writes can happen during this time. This is one of the reasons why InnoDB I/O settings, that we discussed above, are very important. We can also help by increasing the redo log size by changing innodb_log_file_size. The rule of thumb is to set them large enough to cover at least 1h of writes. We discussed InnoDB I/O settings in more details in an earlier post, where we also covered a method for calculating InnoDB redo log size.
MySQL query cache is also often “tuned” - this cache stores hashes of the SELECT statements and their results. There are two problems with it - first one, the cache may be frequently flushed. If any DML was executed against a given table, all results related to this table are removed from the query cache. This seriously impacts the usefulness of the MySQL query cache. Second problem is that the query cache is protected by a mutex and access is serialized. This is a significant drawback and limitation for any workload with higher concurrency. Therefore it is strongly recommended to “tune” MySQL query cache by disabling it altogether. You can do it by setting query_cache_type to OFF. It’s true that in some cases it can be of some use, but most of the time it’s not. Instead of relying on MySQL query cache, you can also leverage any other external systems like Memcached or Redis to cache data.
Internal contention handling
Another set of settings that you may want to look at are variables that control how many instances/partitions of a given structure that MySQL should create. We are talking here about the variables: innodb_buffer_pool_instances, table_open_cache_instances, metadata_locks_hash_instances and innodb_adaptive_hash_index_partitions. Those options were introduced when it became clear that, for example, a single buffer pool or single adaptive hash index can become a point of contention for workloads with high concurrency. Once you find out that one of those structures becomes a pain point (we discussed how you can catch these situations in an earlier blog post) you’ll want to adjust the variables. Unfortunately, there are no rules of thumb here. It’s suggested that a single buffer pool instance should be at least 2GB in size, so for smaller buffer pools you may want to stick to this limit. In case of the other variables, if we are talking about issues with contentions, you will probably increase the number of instances/partitions of those data structures, but there are no rules on how to do that - you need to observe your workload and decide at which point contention is no longer an issue.
There are a few other settings you may want to look at, some are applicable in the most efficient way at the setup time. Some can be changed dynamically. Those settings won’t have a large impact on the performance (sometimes the impact may also be negative one), but it is still important to keep them in mind.
max_connections - on one hand you want to keep it high enough to handle any incoming connections. On the other hand, you don’t want to keep it too high as most of the servers are not able to handle hundreds or more connections simultaneously. One way of going around this problem is to implement connection pooling on the application side, or e.g. using a load balancer like HAProxy to throttle the load.
log_bin - if you are using MySQL replication, you need to have binary logs enabled. Even if you do not use them, it’s very handy to keep them enabled as they can be used to do a point-in-time recovery.
skip_name_resolve - this variable decides whether DNS lookup is performed on the host that is a source of incoming connection. If enabled, FQDNs can be used in MySQL grants as host. If it’s not, only users defined with IP addresses as host will work. The problem of having DNS lookup enabled is that it can introduce extra latency. DNS servers can also stop responding (because of a crash or network issues) and in such case MySQL won’t be able to accept any new connections.
innodb_file_per_table - this variable decides if InnoDB tables are to be created in a separate tablespace (when set to 1) or in the shared tablespace (when set to 0). It’s much easier to manage MySQL when each of the InnoDB tables has a separate tablespace. For example, with separate tablespaces you can easily reclaim disk space by dropping the table or partition. With shared tablespace it doesn’t work - the only way of reclaiming the disk space is to dump the data, clean the MySQL data directory and then reload the data again. Obviously, this is not convenient.
That is it for now. As we mentioned at the beginning, tweaking those settings might not make your MySQL database blazing fast - you are more likely to speed it up by tuning your queries. But they should still have visible impact on the overall performance. Good luck with the tuning work!
- Blog - Become a DBA blog series - Monitoring and Trending
- Blog - Monitoring Host Metrics of your Database Instances - How to interpret Operating System Data
- Blog - Monitoring Galera Cluster for MySQL or MariaDB - Understanding metrics and their meaning
- Blog - Monitoring Galera/MySQL/MariaDB - Understanding and Optimizing CPU-related InnoDB metrics
- Blog - Monitoring Galera/MySQL/MariaDB - Understanding and Optimizing IO-related InnoDB metrics
- Webinar replay - Deep dive into Galera monitoring (Galera Cluster for MySQL/MariaDB, Percona XtraDB Cluster)