Monitoring and management are critical to any production environment because performance matters. Slow user interfaces that lag or do not respond, delayed alerts, and cluster job timeouts when the server is starved of resources are all things that can cause trouble. There are ways to improve the performance of ClusterControl, especially if you are managing multiple clusters and each cluster contains multiple nodes. This blog provides some tuning tips. The points elaborated here are curated based on our experience dealing with performance issues reported by our users and customers.
As an introduction, ClusterControl consists of several main components – a web application (frontend) based on PHP and a number of daemonized processes (backend). These leverage a MySQL/MariaDB database for persistent storage. You are effectively controlling your cluster from the web application, which will be translated to a series of process calls executed by the backend processes to manage and monitor your database clusters.
ClusterControl components rely on a MySQL or MariaDB database as the persistent store for monitoring data collected from the managed nodes and all ClusterControl metadata (e.g., what jobs there are in the queue, backup schedules, backup statuses, etc.). By default, the installer script will install whatever version comes by the standard repository of the OS. The following is the MySQL/MariaDB version being installed by the installer:
CentOS/Redhat 6 – MySQL 5.1
CentOS/Redhat 7 – MariaDB 5.5
Ubuntu 18.04 (Bionic) – MySQL 5.7
Ubuntu 16.04 (Xenial) – MySQL 5.7
Ubuntu 14.04 (Trusty) – MySQL 5.5
Debian 9 (Stretch) – MySQL 5.5
Debian 8 (Jessie) – MySQL 5.5
Debian 7 (Wheezy) – MySQL 5.5
The installer script would do some basic tuning like configuring the datadir location, MySQL port, user, and InnoDB buffer pool size at the beginning of the installation stage. However, the tuning might not be suitable once you have imported or created more clusters/nodes. With an increased number of nodes to be monitored and managed, ClusterControl would use more resources, and the database layer is commonly the first bottleneck that users encounter. Some further tuning might be needed to contain the incoming load.
ClusterControl is smart enough to detect any performance anomaly by writing up the following lines inside cmon_X.log (where X is the cluster ID):
2018-11-28 01:30:00 : (INFO) CmonSheetManager at 0x3839af0 loaded 70 values for key 'diskstat' between 2018-11-23 01:30:00 - 2018-11-28 01:30:0 0. 2018-11-28 01:30:00 : (INFO) SQL processing: 220.0000ms 2018-11-28 01:30:00 : (INFO) Parsing : 0.0000ms 2018-11-28 01:30:00 : (INFO) Sum : 220.0000ms
The above simply means it took 220ms (Sum value) to load 70 values for component “diskstat,” where most of the processing time was happening at the SQL processing stage and 0 ms to parse the SQL resultset. This concludes that the SQL layer took most of the processing time when ClusterControl tried to query the dataset.
We believe most of the SQL queries executed by ClusterControl are properly optimized for a single MySQL instance and use proper indexing. Simply said, if you see something like the above appearing regularly in the log file, some improvements to the database layer are in order, as shown in the following sections.
Tuning InnoDB Buffer Pool Size
Buffer pool size is an important component and has to be configured upfront to improve the MySQL performance. It allows MySQL processing to be happening inside memory instead of hitting the disk. A simple rule of thumb is to check the InnoDB hit ratio and look for the following line under BUFFER POOL AND MEMORY section:
Buffer pool hit rate 986 / 1000
The hit rate of 986 / 1000 indicates that out of 1000 row reads, it was able to read the row in RAM 986 times. The remaining 14 times, it had to read the row of data from disk. Simply said, 1000 / 1000 is the best value that we are trying to achieve here, which means the frequently-accessed data fits fully in RAM.
Increasing the innodb_buffer_pool_size value will help a lot to accomodate more room for MySQL to work on. However, ensure you have sufficient RAM resources beforehand. By default, ClusterControl allocates 50% of the RAM to the buffer pool. If the host is dedicated to ClusterControl, you can even push it to a higher value like 70%, provided you spare at least 2GB of RAM to the OS processes and caches. If you can’t allocate that much, increasing the RAM is the only solution.
Changing this value requires a MySQL restart (older than MySQL 5.7.5), thus the correct service restart ordering will be:
- Modify innodb_buffer_pool_size value inside my.cnf.
- Stop the CMON service.
- Restart MySQL/MariaDB service.
- Start the CMON service.
Or simply reboot the host if you can afford a longer downtime.
By default, the installer script will configure max_connections value up to 512. This is rather high, although sane, since ClusterControl barely reaches 200 connections in total, unless the MySQL server is shared with other applications or you have tens of MySQL nodes monitored by ClusterControl (we are talking about 30 nodes and more).
A high max_connections value wastes resources and adjusting the value will affect the maximum memory configured for MySQL. If it is greater than System RAM then there is a chance that the MySQL Server process will terminate with an OOM exception, if all connections are used.
To check on this, simply look for max_used_connections MySQL status. The following is the maximum connections ever reached by MySQL on a ClusterControl node that monitors 2 clusters with 6 MySQL nodes in total:
mysql> SHOW STATUS like 'max_used_connections'; +----------------------+-------+ | Variable_name | Value | +----------------------+-------+ | Max_used_connections | 43 | +----------------------+-------+
A good value to start is Max_used_connections x 2, and gradually increase it if the value is consistently growing. Modifying the max_connections variable can be done dynamically via SET GLOBAL statement.
Using MySQL socket file
By default, the installer script will automatically configure the following host value inside every ClusterControl database configuration files:
|/etc/cmon.d/cmon_X.cnf (X is the cluster ID)||mysql_hostname=127.0.0.1|
The above will force the MySQL client to connect via TCP networking, just like connecting to a remote MySQL host although ClusterControl is running on the same server as the MySQL server. We purposely configured it this way to simplify the installation process since almost every OS platform pre-configures MySQL socket file differently, which greatly reduces the installation failure rate.
Changing the value to “localhost” will force the client to use the MySQL Unix socket file instead:
|/etc/cmon.d/cmon_X.cnf (X is the cluster ID)||mysql_hostname=localhost|
On Unix based systems, MySQL programs treat the host name localhost specially, in a way that is likely different from what you expect compared to other network-based programs. For connections to localhost, MySQL programs attempt to connect to the local server by using a Unix socket file. This occurs even if a –port or -P option is given to specify a port number.
Using MySQL UNIX socket file is much more secure and will cut off the network overhead. It is always recommended over TCP. However, you need to make sure the socket file is configured correctly. It must exist on the following directives inside my.cnf and every MySQL option files on ClusterControl node, for example:
[mysqld] socket=/var/lib/mysql/mysql.sock [client] socket=/var/lib/mysql/mysql.sock [mysql] socket=/var/lib/mysql/mysql.sock [mysqldump] socket=/var/lib/mysql/mysql.sock
Changing the socket file will also require a MySQL and CMON restart. If you are using the “localhost”, you can then add some additional configuration options like skip-networking=1, to prevent accepting remote connections. Our ClusterControl Docker image is using this approach to overcome a limitation in docker-proxy when binding on ports.
OpenSSH with SSSD
ClusterControl uses SSH protocol as its main communication channel to manage and monitor remote nodes. The default OpenSSH configurations are pretty decent and should work in most cases. However, in some environments where SSH is integrated with other security enhancement tools like SElinux or System Security Services Daemon (SSSD), it could bring significant impact to the SSH performance.
We have seen many cases where an ever increasing amount of SSH connections established to each of the managed nodes and eventually, both the ClusterControl server and all managed nodes max out their system memory with SSH connections. In some cases, only a normal full system reboot nightly on the ClusterControl server could solve the problem.
If you running your infrastructure with System Security Services Daemon (SSSD), it’s advised for you to comment the following line inside OpenSSH client configuration at /etc/ssh/ssh_config on ClusterControl node:
#ProxyCommand /usr/bin/sss_ssh_knownhostsproxy -p %p %h
The above will skip using SSSD to manage the host key, which will be handled by OpenSSH client instead.
Starting from ClusterControl 1.7.0, you have an option to use agent-based monitoring tool with Prometheus. With agent-based monitoring, ClusterControl does not use SSH to sample host metrics which can be excessive in some environments.
File System and Partitioning
ClusterControl controller writes new entry in its log file almost every second for every cluster. For those who wants to take advantage of this sequential writes on disk and would like to save cost, you can use a spindle disk for this purpose. Modify the following line inside all cmon_X.cnf:
Replace X with the cluster ID and restart CMON service to apply the changes.
If you are using ClusterControl as the backup repository, it’s recommended for you to allocate sufficient disk space on a separate partition other than the root partition. It gets better if the partition resides on a networked or clustered file system for easy mounting with the targeted nodes when performing restoration operation. We have seen cases where the created backups ate up all disk space of the main partition, eventually impacting ClusterControl and its components.
Keep up to Date
ClusterControl has a short release cycle – at least one new major release every quarter of the year plus weekly maintenance patches (mostly bug fixes – if any). The reason is ClusterControl supports multiple database vendors and versions, operating systems, and hardware platforms. There are often new things being introduced and old things being deprecated from what is provided. ClusterControl has to keep up with all the changes introduced by application vendors and follow the best-practice every time.
We recommend users to always use the latest version of ClusterControl (upgrading is easy) together with the latest web browser (built and tested on Google Chrome and Mozilla Firefox), as we are very likely taking advantage of the new features available in the latest version.
If you have any questions or encounter any problems or slowness issues when using ClusterControl, please contact us via our support channel. Suggestions and feedback are very much welcome.