ClusterControl Documentation
Use the menu below or the search below to learn everything you need to know about ClusterControl

6. Components

ClusterControl consists of a number of components:

Component Package naming Role
ClusterControl Controller (cmon) clustercontrol-controller The brain of ClusterControl. A backend service performing automation, management, monitoring and scheduling tasks. All the collected data will be stored directly inside CMON database.
ClusterControl UI clustercontrol A modern web user interface to visualize and manage the cluster. It interacts with CMON controller via remote procedure call (RPC) or REST API interface.
ClusterControl SSH clustercontrol-ssh Optional package introduced in ClusterControl 1.4.2 for ClusterControl’s web SSH console. Only works with Apache 2.4+.
ClusterControl Notifications clustercontrol-notifications Optional package introduced in ClusterControl 1.4.2 providing a service and user interface for notification services and integration with third party tools.
ClusterControl Cloud clustercontrol-cloud Optional package introduced in ClusterControl 1.5 providing a service and user interface for integration with cloud providers.
ClusterControl Cloud File Manager clustercontrol-clud Optional package introduced in ClusterControl 1.5 providing a command-line interface to interact with storage objects on cloud.
ClusterControl CLI s9s-tools Open-source command line tool to manage and monitor clusters provisioned by ClusterControl.

ClusterControl controller exposes all functionality through remote procedure calls (RPC) on port 9500 (authenticated by a RPC token), port 9501 (RPC with TLS) and integrates with a number of modules like notifications (9510), cloud (9518) and web SSH (9511). The client components, ClusterControl UI or ClusterControl CLI interact with those interfaces to retrieve monitoring data (cluster load, host status, alarms, backup status etc.) or to send management commands (add/remove nodes, run backups, upgrade a cluster, etc.).

The following diagram illustrates the architecture of ClusterControl:

ClusterControl architecture

6.1. ClusterControl Controller (CMON)

ClusterControl Controller (CMON) is the core backend process that performs all automation and management procedures. It is installed as /usr/sbin/cmon. It comes with a collection of helper scripts in /usr/bin directory (prefixed with s9s_) to complete specific tasks. However, some of the scripts have been deprecated due to the corresponding tasks are now being handled by the CMON core process.

ClusterControl Controller builds are available at Severalnines Repository. The packages are also available at Severalnines download site. RedHat-based systems should download and install the RPM package while Debian-based systems should download and install the DEB package. The package name is formatted as:

  • RPM package (RedHat-based systems): clustercontrol-controller-[version]-[build number]-[architecture].rpm
  • DEB package (Debian-based systems): clustercontrol-controller-[version]-[build number]-[architecture].deb

A configuration file /etc/cmon.cnf is required to initially run CMON Controller. It is possible to have several configuration files each for multiple clusters as described in the Configuration File.

6.1.1. Command Line Arguments

By default if you just run cmon (without any arguments), cmon defaults to run in the background. ClusterControl Controller (cmon) supports several command line options as shown below:

Shorthand, Option Description
-h, --help Print the help.
--help-config Print the manual for configuration parameters. See Configuration Options.
--help-init Shows the special options for --init.
-v, --version Prints out the version number and build info.
--logfile=[filepath] The path of the log file to be used.
-s, --syslog Also log to syslog.
-b, --bind-addr='ip1,ip2..' Bind Remote Procedure Call (RPC) to IP addresses (default is 127.0.0.1,::1). By default cmon binds to ‘127.0.0.1’ and ‘::1’. If another bind-address is needed, then it is possible to define the bind addresses in the file /etc/default/cmon. See Startup File.
-c, --cloud-service=URL A custom clustercontrol-cloud service URL.
-d, --nodaemon Run in foreground. Ctrl + C to exit.
-e, --events-client=URL Additional RPC URL where backend sends events.
-g, --grant Create grants.
-i, --init Creates configuration file and database.
--log-rpc Log every RPC call (very verbose).
--no-safety-checks Do not check if other cmon is connected.
-p, --rpc-port=[integer] Listen on RPC port. Default is 9500.
-r, --directory=[directory] Running directory.
-u, --upgrade-schema Try to upgrade the CMON schema (Supported from CMON version 1.2.12 and later).

6.1.2. Startup File

To customize the cmon startup process, you can define the Command Line Arguments in a custom file, instead of hacking up the init script directly. The CMON init script (or systemd) will append all configuration options defined inside /etc/default/cmon when starting up the cmon process and translates those options into command line arguments. For example:

$ cat /etc/default/cmon
RPC_PORT=9500
RPC_BIND_ADDRESSES="10.10.10.13,192.168.33.1,127.0.0.1"
EVENTS_CLIENT=http://127.0.0.1:9510
CLOUD_SERVICE=http://127.0.0.1:9518

In the example above, cmon will bind into those IP addresses and listen to port 9500 once started. If you filter out the ps output from the server, you should get the following:

/usr/sbin/cmon --rpc-port=9500 --bind-addr='10.10.10.13,192.168.33.1,127.0.0.1' --events-client='http://127.0.0.1:9510' --cloud-service='http://127.0.0.1:9518'

6.1.3. Configuration File

A single CMON Controller process is able to monitor one or more database clusters. Each of the cluster requires one exclusive configuration file residing in the /etc/cmon.d/ directory. For instance, the default CMON configuration file is located at /etc/cmon.cnf, and commonly used to store the default (minimal) configuration for CMON process to run.

Example of the CMON main configuration file located at /etc/cmon.cnf:

mysql_port=3306
mysql_hostname=127.0.0.1
mysql_password=cm0nP4ss
mysql_basedir=/usr
hostname=10.0.0.196
logfile=/var/log/cmon.log
rpc_key=390faeffb8166277a4f25336a69efa50915635a7

For the first cluster (cluster_id=1), the configuration options should be stored inside /etc/cmon.d/cmon_1.cnf. For the second cluster, it would be /etc/cmon.d/cmon_2.cnf with cluster_id=2 respectively, and so on. The following shows example content of CMON cluster’s configuration file located at /etc/cmon.d/cmon_4.cnf:

cluster_id=4
cmon_user=cmon
created_by_job=1
db_stats_collection_interval=30
enable_query_monitor=1
galera_vendor=percona
galera_version=3.x
group_owner=1
host_stats_collection_interval=60
hostname=10.0.0.196
logfile=/var/log/cmon_4.log
mode=controller
monitored_mountpoints=/var/lib/mysql/
monitored_mysql_port=3306
monitored_mysql_root_password='[email protected]'
mysql_bindir=/usr/bin/
mysql_hostname=127.0.0.1
mysql_password='cm0nP4ss'
mysql_port=3306
mysql_server_addresses=10.0.0.99:3306,10.0.0.253:3306,10.0.0.181:3306
mysql_version=5.6
name='Galera Cluster'
os=redhat
osuser=root
owner=1
pidfile=/var/run
basedir=/usr
repl_password='9hHRgQLSsZz3Vd4a'
repl_user=rpl_user
rpc_key=3V0RaV6dE8KSyClE
ssh_identity=/root/.ssh/id_rsa
ssh_port=22
type=galera
vendor=percona

An example of CMON configuration file hierarchy is as follows:

Example cluster Configuration file Cluster identifier Log file location
Default configuration /etc/cmon.cnf N/A logfile=/var/log/cmon.log
Cluster #1 (Galera) /etc/cmon.d/cmon_1.cnf cluster_id=1 logfile=/var/log/cmon_1.log
Cluster #2 (MongoDB) /etc/cmon.d/cmon_2.cnf cluster_id=2 logfile=/var/log/cmon_2.log
Cluster #N (cluster type) /etc/cmon.d/cmon_N.cnf cluster_id=N logfile=/var/log/cmon_N.log

Note

It’s highly recommended to separate CMON logging for each cluster to its own log file. In the above example, we can see that cluster_id and logfile are two important configuration options to distinguish the cluster.

The CMON Controller will import the configuration options defined in each configuration file into the CMON database during process starts up. Once loaded, CMON then use all the loaded information to manage clusters based on the cluster_id value.

6.1.4. Configuration Options

Values that consists of special characters must be enclosed with single-quote. Any changes to the CMON configuration file requires a service restart before they are applied. The configuration options can be divided into the following categories:

  1. General Cluster
  2. CMON
  3. Operating System
  4. SSH
  5. ClusterControl Recovery
  6. Monitoring and Thresholds
  7. Query Monitor
  8. Backup
  9. MySQL/MariaDB Nodes
  10. MongoDB Nodes
  11. PostgreSQL/TimescaleDB Nodes

Following is the list of configuration options inside CMON configuration file. You can also see them by using --help-config parameter in the terminal:

$ cmon --help-config

6.1.4.1. General Cluster

Option Description
cluster_id=<integer> Cluster identifier. This will be used by CMON to indicate which cluster to provision. It must be unique, two clusters can not share the same ID. Example: cluster_id=1.
name=<string> Cluster name. The cluster name configured under ClusterControl > DB cluster > Settings > CMON Settings > Cluster Name precedes this. Example: name='Galera Cluster'. Other alias: cluster_name.
type=<string> Cluster type. Supported values are “galera”, “mysql_single”, “mysqlcluster”, “mongodb”, “postgresql_single”, “replication”, “group_replication”. Example: type=galera. Other alias: cluster_type.
created_by_job=<integer> The ID of the job created this cluster. This is usually automatically generated by ClusterControl. Example: created_by_job=13.
vendor=<string> Database vendor name. ClusterControl needs to know this in order to distinguish the vendor’s relevant naming convention especially for package name, daemon name, deployment steps, recovery procedures and lots more. Supported value at the moment is oracle, percona, codership, mariadb, 10gen. Example: vendor=percona.
use_internal_repos=<boolean integer> Setting which disabled the 3rd party repository to be set up. Default is 0 (false).
cmon_use_mail=<boolean integer> Setting to use the ‘mail’ command for e-mailing. Default is 0 (false).
enable_html_emails=<boolean integer> Enables sending of HTML e-mails. Default is 1 (true).
cmon_mail_sender=<email> The sender email address when sending out emails.
frontend_url=<url> The ClusterControl URL to be embedded inside e-mail notifications. Example frontend_url='https://monitor.domain.com/clustercontrol'
acl=<string> The Access Control List as a string controlling the access to the cluster object.

6.1.4.2. CMON

Option Description
hostname=<string> Hostname or IP address of the controller host. Example: hostname=192.168.0.10.
controller_id=<integer> An arbitrary identifier string of this controller instance. Example: controller_id=1.
mode=<string> CMON role. Supported values are “controller”, “dual”, “hostonly”. Example: mode=controller.
agentless=<boolean integer> CMON controller mode (deprecated). Agents are no longer supported. 0 for agent-based or 1 for agentless (default). Example: agentless=1.
logfile=<path> CMON log file location. This is where CMON logs its activity. The file will be automatically generated if it doesn’t exist. CMON will write to syslog by default. Example: logfile=/var/log/cmon.log.
pidfile=<path> CMON process identifier file directory. Keep the default value is recommended. Example: pidfile=/var/run.
mysql_hostname=<string> The MySQL hostname or IP address where CMON database resides. Using IP address is recommended. Default is 127.0.0.1. Example: mysql_hostname=192.168.0.5. Other aliases: cmon_mysql_hostname, cmondb_hostname, local_mysql_hostname, cmon_local_mysql_hostname.
mysql_password=<string> The MySQL password for user cmon to connect to CMON database. Example: mysql_password='cM%^nP4ss'. Other aliases: cmon_mysql_password, cmondb_password.
mysql_port=<integer> The MySQL port used by CMON to connect to CMON database. Example: mysql_port=3306. Other aliases: cmon_mysql_port, cmondb_port.
mysql_basedir=<path> The MySQL base directory used by CMON to find MySQL client related binaries. Example: mysql_basedir=/usr. Other alias: basedir.
mysql_bindir=<path> The MySQL binary directory used by CMON to find MySQL client related binaries. Example: mysql_bindir=/usr/bin.
config_file_path=<path> The config file path (read-only) for CMON instance.
cmon_db=<string> CMON database name. Default to “cmon”. Example: cmon_db=cmon. Other alias: cmondb_database.
cmon_user=<string> The username for CMON database. Default to “cmon”. Example: cmon_user=cmon. Other alias: cmondb_user.
cmondb_ssl_key=<path> Path to SSL key, for SSL encryption between CMON process and the CMON database. Example: cmondb_ssl_key=/etc/ssl/mysql/client-key.pem.
cmondb_ssl_cert=<path> Path to SSL certificate, for SSL encryption between CMON process and the CMON database. Example: cmondb_ssl_cert=/etc/ssl/mysql/client-cert.pem.
cmondb_ssl_ca=<path> Path to SSL CA, for SSL encryption between CMON process and the CMON database. Example: cmondb_ssl_ca=/etc/ssl/mysql/ca-cert.pem.
cluster_ssl_key=<path> Path to SSL key, for SSL encryption between CMON process and managed MySQL Servers. Example: cluster_ssl_key=/etc/ssl/mysql/client-key.pem.
cluster_ssl_cert=<path> Path to SSL certificate, for SSL encryption between CMON process and managed MySQL Servers. Example: cluster_ssl_cert=/etc/ssl/mysql/client-cert.pem.
cluster_ssl_ca=<path> Path to SSL CA, for SSL encryption between CMON and managed MySQL Servers. Example: cluster_ssl_ca=/etc/ssl/mysql/ca-cert.pem.
cluster_certs_store=<path> Directory path to store SSL related files. This is required when you want to add new node in an encrypted Galera cluster. Example: cluster_certs_store=/etc/ssl/galera/cluster_1.
rpc_key=<string> Unique secret token for authentication. To interact with individual cluster via CMON RPC interface (port 9500), one must use this key or else you would get ‘HTTP/1.1 403 Access denied’. ClusterControl UI needs this key stored as RPC API Token to communicate with CMON RPC interface. Each cluster should be configured with different rpc_key value. This value is automatically generated when new cluster/server is created or added into ClusterControl. Example: rpc_key=VJZKhr5CvEGI32dP. Other alias: rpc_api_key.
net_read_timeout=<integer> Network read timeout value in seconds (DB connections). Default is 10.
net_write_timeout=<integer> Network write timeout value in seconds (DB connections). Default is 10.
connect_timeout=<integer> Network connect timeout value in seconds (DB connections). Default is 10.
owner=<integer> The Cmon user ID of the owner.
group_owner=<integer> The Cmon group ID of the group owner.
cdt_path=<string> The location in the Cmon Directory Tree.
acl=<string> The Access Control List as a string.
stagingdir=<path> Staging path for temporary files. Example: stagingdir=/home/ubuntu/s9s_tmp. Other alias: staging_dir.
plugin_dir=<path> Directory path for CMON plugins. Default to /var/cmon/plugins. Other alias: plugin_path.

6.1.4.3. Operating System

Option Description
os=<string> Operating system runs across the cluster, including ClusterControl host. ‘redhat’ for RedHat-based distributions (CentOS/Red Hat Enterprise Linux/Oracle Linux) or ‘debian’ for Debian-based distributions (Debian/Ubuntu). Example: os=redhat.
osuser=<string> Operating system user that will be used by CMON to perform automation tasks like cluster recovery, backups and upgrades. This user must be able to perform super-user activities. Using root is recommended. Example: os_user=root. Other aliases: os_user, ssh_user.
sudo=<command> The command used to obtain superuser permissions. If sudo user requires password, specify the sudo command with sudo password here. The sudo command must be trimmed by redirecting stderr to somewhere else. Therefore, it is compulsory to have -S 2>/dev/null appended in the sudo command. Example: sudo="echo 'My5ud0' | sudo -S 2>/dev/null". Other alias: sudo_opts.
osuser_home=<path> The home directory of the user used on nodes. Example: osuser_home=/home/ubuntu/. Other aliases: os_user_home, ssh_user_home.
software_packagedir=<path> This is the storage location of software packages, i.e, all necessary files to successfully install a node, if there is no yum/apt repository available, must be placed here. Applies mainly to MySQL Cluster or older Codership/Galera installations.
local_repo_name=<string> The used local repository names for cluster deployment. Example: local_repo_name=epel.
init_name=<string> The OS service name used for starting/stopping the database servers. Example: init_service_name=postgresql-9.6.
disable_numa=<boolean integer> Not to use NUMA support dependent features. Default is 1 (enabled).

6.1.4.4. SSH

Option Description
ssh_identify=<path> The SSH key or key pair file that will be used by CMON to connect managed nodes (including ClusterControl node) passwordlessly. If undefined, CMON will use the home directory of os_user and look for .ssh/id_rsa file. Example: ssh_identity=/root/.ssh/id_rsa. Other aliases: ssh_keypath, identity_file.
ssh_port=<integer> The SSH port used by CMON to connect to managed nodes. If undefined, default to 22. Example: ssh_port=22.
ssh_acquire_tty=<boolean integer> Setting for libssh - should it acquire a remote tty. Default is 1 (true). Example: ssh_acquire_tty=1.
ssh_password=<string> The SSH password used for connection to nodes.
ssh_timeout=<integer> Network timeout value in seconds for SSH connections. Default is 30. Example: ssh_timeout=30. Other alias: libssh_timeout=<integer>.
libssh_loglevel=<integer> Setting for libssh logging verbosity to stdout. Accepted values are 0 (NONE), 1 (WARN), 2 (INFO), 3 (DEBUG), 4 (TRACE). Example: libssh_loglevel=2.
slow_ssh_warning=<integer> A warning alarm will be raised if it takes longer than the specified time to setup an SSH connection (secs). Default is 6 seconds. Example: slow_ssh_warning=6.
slow_ssh_critical=<integer> A critical alarm will be raised if it takes longer than the specified time to setup an SSH connection (secs). Default is 12 seconds. Example: slow_ssh_critical=12.

6.1.4.5. ClusterControl Recovery

Option Description
enable_cluster_autorecovery=<boolean integer> If undefined, CMON defaults to 0 (false) and will NOT perform automatic recovery if it detects cluster failure. Supported value is 1 (cluster recovery is enabled) or 0 (cluster recovery is disabled).
enable_node_autorecovery=<boolean integer> If undefined, CMON default to 0 (false) and will NOT perform automatic recovery if it detects node failure. Supported value is 1 (node recovery is enabled) or 0 (node recovery is disabled).
enable_autorecovery=<boolean integer> If undefined, CMON defaults to 0 (false) and will NOT perform automatic recovery if it detects node or cluster failure. Supported value is 0 (cluster and node recovery are disabled) or 1 (cluster and node recovery are enabled). This setting will internally set enable_node_autorecovery and enable_cluster_autorecovery to the specified value.
node_recovery_lock_file=<path> Specify a lock file and if present on a node, the node will not recover. The administrator is responsible to create/remove the file. Example: node_recovery_lock_file=/root/do_not_recover.
node_recovery_timeout=<integer> Stop the recovery process after it reaches this timeout. Default is 28800 seconds (8 hours). Other alias: add_node_timeout.
send_clear_cluster_failure=<boolean integer> Send notification email if a cluster failure event is cleared (meaning the cluster is recovered). Default is true. Other alias: send_clear_alarm.

6.1.4.6. Monitoring and Thresholds

Option Description
monitored_mountpoints=<paths> The MySQL/MongoDB/PostgreSQL data directory used by database nodes for disk performance in comma separated list. Example: monitored_mountpoints=/var/lib/mysql,/mnt/data/mysql. Other alias: monitored_mount_points.
monitored_nics=<string> List of network interface name to be monitored for network performance in comma separated list. Example: monitored_nics=eth1,eth2.
db_stats_collection_interval=<integer> Database metrics sampling interval in seconds. The lowest value is 1. Default is 30 seconds. Example: db_stats_collection_interval=30.
host_stats_collection_interval=<integer> Host metrics sampling interval in seconds. The lowest value is 1. Default is 30 seconds. Example: host_stats_collection_interval=30.
lb_stats_collection_interval=<integer> Load balancer stats collection interval. Default is 15. Example: lb_stats_collection_interval=30.
db_schema_stats_collection_interval=<integer> How often database growth and table checks are performed in seconds. This translates to information_schema queries. Default is 10800 seconds (3 hours). 0 means disabled. Example: db_schema_stats_collection_interval=10800.
db_proc_stats_collection_interval=<integer> Setting for database process stats collection interval. Default is 3 seconds. Minimum allowed value is 1 second. Example: db_proc_stats_collection_interval=5.
db_log_collection_interval=<integer> Database log files collection interval. Default is 600. Example: db_log_collection_interval=600.
db_deadlock_check_interval=<integer> How often to check for deadlocks in seconds. Deadlock detection will affect CPU usage on database nodes. Default is 0, means disabled. Example: db_deadlock_check_interval=600.
db_schema_max_objects=<integer> Maximum number of database objects that ClusterControl will pull from monitored database nodes. If the number of schema objects (tables, triggers, views) are greater than this then no schema analysis will be done. Example: db_schema_max_objects=500.
db_exporter_user=<string> Database user for Prometheus exporter. Default is db_exporter_user=cmonexporter.
db_exporter_password=<string> Password for db_exporter_user. Example: db_exporter_password=myS3cret.
db_exporter_use_nonlocal_address=<boolean integer> Specifies if Prometheus exporter should connect to the non-local address of the DB services, instead of 127.0.0.1. Default is 0 (false). Example: db_exporter_use_nonlocal_address=1.
db_hourly_stats_collection_interval=<integer> Database statistic collections interval in seconds. Default is 5. Example: db_hourly_stats_collection_interval=5.
enable_mysql_timemachine=<boolean integer> This determine whether ClusterControl should enable MySQL time machine status and variable collections. The status time machine allows you to select status variable for a time range and compare the values at the start and end of that range from ClusterControl UI. Default is 0, meaning it is disabled. Example: enable_mysql_timemachine=1.
swap_warning=<integer> Warning alarm threshold for swap usage. Default is 5. Also configurable at ClusterControl > {cluster_id} > Settings > Thresholds. Example: swap_warning=20.
swap_critical=<integer> Critical alarm threshold for swap usage. Default is 20. Also configurable at ClusterControl > {cluster_id} > Settings > Thresholds. Example: swap_critical=40.
swap_inout_period=<integer> The interval for swap I/O alarms in seconds. 0 means disabled. Default is 600 (10 minutes). Example: swap_inout_period=120.
swap_inout_warning=<integer> The number of pages swapped I/O for warning in the specified swap_inout_period. Default is 10240. To determine the page size for the host, use getconf PAGESIZE. Example: swap_inout_warning=51200.
swap_inout_critical=<integer> The number of pages swapped I/O for critical in the specified swap_inout_period. Default is 102400. To determine the page size for the host, use getconf PAGESIZE. Example: swap_inout_critical=102400.
save_history_days=<integer> How many days controller shall keep data. Default is 7. 0 means disabled.
mysqlmemory_warning=<integer> Warning alarm threshold for MySQL memory. Default is 80.
mysqlmemory_critical=<integer> Critical alarm threshold for MySQL memory. Default is 90.
ram_warning=<integer> Warning alarm threshold for RAM usage. Default is 80.
ram_critical=<integer> Critical alarm threshold for RAM usage. Default is 90.
diskspace_warning=<integer> Warning alarm threshold for disk usage. Default is 80.
diskspace_critical=<integer> Critical alarm threshold for disk usage. Default is 90.
cpu_warning=<integer> Warning alarm threshold for CPU usage. Default is 80.
cpu_critical=<integer> Critical alarm threshold for CPU usage. Default is 90.
cpu_steal_warning=<integer> Warning alarm threshold for CPU steal. Default is 10.
cpu_steal_critical=<integer> Critical alarm threshold for CPU steal. Default is 20.
cpu_iowait_warning=<integer> Warning alarm threshold for CPU IO Wait. Default is 50.
cpu_iowait_critical=<integer> Critical alarm threshold for CPU IO Wait. Default is 60.
monitor_cpu_temperature=<boolean integer> Whether to monitor CPU temperature. Default is 0 (false).
redobuffer_warning=<integer> Warning alarm threshold for redo buffer usage. Default is 80.
redobuffer_critical=<integer> Critical alarm threshold for redo buffer usage. Default is 90.
indexmemory_warning=<integer> Warning alarm threshold for index memory usage. Default is 80.
indexmemory_critical=<integer> Critical alarm threshold for index memory usage. Default is 90.
datamemory_warning=<integer> Warning alarm threshold for data memory usage. Default is 80.
datamemory_critical=<integer> Critical alarm threshold for data memory usage. Default is 90.
tablespace_warning=<integer> Warning alarm threshold for table space buffer memory usage. Default is 80.
tablespace_critical=<integer> Critical alarm threshold for table space buffer memory usage. Default is 90.
redolog_warning=<integer> Warning alarm threshold for redo log usage. Default is 80.
redolog_critical=<integer> Critical alarm threshold for redo log usage. Default is 90.
enable_is_queries=<boolean integer> Specifies whether queries to the information_schema will be executed or not. Queries to the information_schema may not be suitable when having many schema objects (100s of databases, 100s of tables in each database, triggers, users, events, sprocs). If disabled, the query that would be executed will be logged so it can be determined if the query is suitable in your environment. Default is 1 (enabled). Disable with 0.
max_replication_lag=<integer> Max allowed replication lag in seconds before sending an Alarm. Default is 10.
enable_icmp_ping=<boolean integer> Toggles if controller shall measure the ICMP ping times to the host. Default is 1 (true).

6.1.4.7. Query Monitor

Option Description
long_query_time=<float> Threshold value for slow query checking. Default 0.5. Example: long_query_time=0.0003.
log_queries_not_using_indexes=<boolean integer> Set query monitor to detect queries not using indexes. Default is 0 (false).
query_monitor_use_local_settings=<boolean integer> Don’t override the local settings of MySQL’s long_query_time and log_queries_not_using_indexes. Default is 0 (false).
db_long_query_time_alarm=<integer> If a query takes longer than db_long_query_time_alarm to execute, an alarm will be raised containing detailed information about blocked and long running transactions. Default is 10 seconds. Example: db_long_query_time_alarm=5.
enable_query_monitor=<integer> Setting for query monitor interval in seconds. Default is 1 means enabled. -1 means disable. Other alias: query_monitor_enabled.
enable_query_monitor_auto_purge_ps=<boolean integer> If enabled the Performance_Schema table events_statements_summary_by_digest will be auto-purged (using TRUNCATE TABLE) every hour. By default this is disabled (‘false’). Enable by setting it to ‘1’ (true). Other alias: query_monitor_auto_purge_ps.
query_monitor_long_running_query_ms=<integer> Raises an alarm if a query executes for longer than this value in milliseconds. Default is 10000. Minimum value is 1000.
query_monitor_alert_long_running_query=<boolean integer> Raises an alarm if a query executes for longer than query_monitor_long_running_query_ms. Default is 0 (disabled). Enable with 1.
query_monitor_kill_long_running_query=<boolean integer> Kill the query if the query executes for longer than query_monitor_long_running_query_ms. Default is 0 (Disabled). Enable with 1.
query_monitor_long_running_query_matching_info=<regex> Match only queries with a ‘Info’ only matching this regex. No default value, means match any Info.
query_monitor_long_running_query_matching_host=<regex> Match only queries with a ‘Host’ only matching this regex. No default value, means match any Host.
query_monitor_long_running_query_matching_db=<regex> Match only queries with a ‘Db’ only matching this regex. No default value, means match any Database.
query_monitor_long_running_query_matching_user=<regex> Match only queries with a ‘User’ only matching this regex. No default value, means match any User.
query_monitor_long_running_query_matching_command=<regex> Match only queries with a ‘Command’ only matching this regex. Defaults to ‘Query’.

6.1.4.8. Backup

Option Description
netcat_port=<string> List of netcat/socat ports and port ranges used to stream backups. The first value before a comma is the preferred port. The next value is a port range where ClusterControl should pick on. Defaults to ‘9999,9990-9998’ which means port 9999 will be preferred if available, otherwise pick the next available port in the defined range. Example: netcat_port=9999,9990-9998.
backup_user=<string> The database username for backups. Example backup_user=backupuser.
backup_user_password=<string> The database password for backup user. Example backup_user_password=MyS3cret.
backup_encryption_key=<string> The AES encryption key used for backups in base64. See Backup Encryption and Decryption for details.
backupdir=<path> The default backup directory, to be pre-filled in ClusterControl UI. Example: backupdir=/storage/backup.
backup_subdir=<string> Set the name of the backup subdirectory. For more details on the formatting, see `Backup Subdirectory Variables`_. Default is “BACKUP-%I”. Example: backup_subdir=BACKUP-%I-%D.
backup_retention=<integer> How many days to keep the backups. Backups matching retention period are removed. Default is 31. Example: backup_retention=15.
backup_cloud_retention=<integer> How many days to keep the uploaded backups to cloud. Backups matching retention period are removed. Default is 180. Example=``backup_cloud_retention=90``.
backup_n_safety_copies=<integer> How many completed full backups will be kept regardless of their retention status. Default is 1. Example: backup_n_safety_copies=3.
datadir_backup_path=<path> During restore/rebuild operations a backup (filesystem copy) of the existing data directory (datadur) maybe performed (user decides). Unless specified, the default data directory backup location is DATADIR_bak, e.g /var/lib/mysql_bak if the datadir is /var/lib/mysql.
disable_backup_email=<boolean integer> This setting controls if emails are sent or not if a backup finished or failed. This feature is disabled by default, meaning backup emails are sent. Other alias: disable_backup_emails.

6.1.4.9. MySQL/MariaDB Nodes

Option Description
mysql_server_addresses=<string> Comma separated list of MySQL hostnames or IP addresses (with or without port is supported). For MySQL Cluster, this should be the list of MySQL API node IP addresses. In case of Galera Cluster, you can add ?slave or ?bvs (backup verification server) to the URL so ClusterControl will register the node accordingly. Example: mysql_server_addresses=192.168.0.11:3306,192.168.0.12:3306,192.168.0.13:3306,192.168.0.14:3306?slave.
monitored_mysql_port=<integer> MySQL port for the managed cluster. ClusterControl assumes all DB nodes are listening on the same port. Default is 3306. Example: monitored_mysql_port=3306. Other aliases: cmon_local_mysql_port, local_mysql_port.
monitored_mysql_root_user=<string> MySQL root user for the managed cluster. ClusterControl assumes all DB nodes are using the same root user. The user must have same privileges as root (SUPER with GRANT OPTION). This is required when you want to scale your cluster by adding a new DB node or replication slave. Default is “root”. Example: monitored_mysql_root_user=dbadmin.
monitored_mysql_root_password=<string> MySQL root password for the managed cluster. ClusterControl assumes all DB nodes are using the same root password. This is required when you want to scale your cluster by adding a new DB node or replication slave. Example: monitored_mysql_root_password='r00tP$@^%sw0rd'.
skip_name_resolve=<boolean integer> Setting to disable name resolution. Default is 1 means skip_name_resolve is enabled.
mysql_version=<string> The MySQL/MariaDB major version. Example: mysql_version=5.7. Other alias: server_version.
galera_port=<integer> Exclusive for Galera Cluster. The Galera communication port to be used. Default is 4567.
galera_version=<string> Exclusive for Galera Cluster. The Galera API major version number. Example: galera_version=3.x.
galera_vendor=<string> Exclusive for Galera Cluster. The database vendor name. Supported values are “percona”, “codership” and “mariadb”. Example: galera_vendor=mariadb.
repl_user=<string> The MySQL replication user. Example: repl_user=repluser.
repl_password=<string> Password for repl_user. Example: repl_password='ZG04Z2Jnk0MUWAZK'.
replication_failover_whitelist=<string> Comma separated list of MySQL slaves which should be used as potential master candidates. If no server on the whitelist is available (up/connected) the failover will fail. If this variable is set, only those hosts will be considered. This parameter takes precedence over replication_failover_blacklist. Example: replication_failover_whitelist=192.168.1.11,192.168.1.12.
replication_failover_blacklist=<string> Comma separated list of MySQL slaves which will never be considered a master candidate. You can use it to list slaves that are used for backups or analytical queries. If the hardware varies between slaves, you may want to put here the slaves which use slower hardware. replication_failover_whitelist takes precedence over this parameter if it is set. Example: replication_failover_blacklist=192.168.1.101,192.168.1.102.
replication_skip_apply_missing_txs=<boolean integer> Default is 0. Skip the check process for additional missing transactions before promoting a slave to a master and just use the most advanced slave. Such process may result in a serious problem though - if an errant transaction is found, replication may be broken. Example: replication_skip_apply_missing_txs=1.
replication_stop_on_error=<boolean integer> Default is 1. ClusterControl will perform the MySQL master switch only once and will be aborted immediately if the switchover fails, unless the controller is restarted or you specify this variable to 0. Example: replication_stop_on_error=0.
replication_failover_wait_to_apply_timeout=<integer> Candidate waits up to this many seconds to apply outstanding relay log (retrieved_gtids) before failing over. Default is -1, which means ClusterControl will wait indefinitely for it to apply all missing transactions from its relay logs. This is safe, but, if for some reason, the most up-to-date slave is lagging badly, failover may takes hours to complete. If set to 0, failover happens immediately, no matter if the master candidate is lagging or not. Default -1 seconds (wait forever). Value higher than 0 means ClusterControl will wait for the specified seconds before failover happens. Example: replication_failover_wait_to_apply_timeout=0.
replication_stop_on_error=<boolean integer> Failover/switchover procedures will fail if errors are encountered that may cause data loss. Enabled by default. 0 means disable, default is 1 (true). Example: replication_stop_on_error=0.
replication_auto_rebuild_slave=<boolean integer> If the SQL THREAD is stopped and error code is non-zero then the slave will be automatically rebuilt. 1 means enable, default is 0 (false). Example: replication_auto_rebuild_slave=1.
replication_onfail_failover_script=<path> Path to the failover script on ClusterControl node. This script executes as soon as it has been discovered that a failover is needed. If the script returns non-zero it will force the failover to abort. If the script is defined but not found, the failover will be aborted. Four arguments are supplied to the script: arg1=’all servers’ arg2=’old master’ arg3=’candidate’, arg4=’slaves of old master’ and passed like this: scriptname arg1 arg2 arg3 arg4. The script must be accessible on the controller and executable. Example: replication_onfail_failover_script=/usr/local/bin/failover_script.sh
replication_pre_failover_script=<path> Path to the failover script on ClusterControl node. This script executes before the failover happens, but after a candidate has been elected and it is possible to continue the failover process. If the script returns non-zero it will force the failover to abort. If the script is defined but not found, the failover will be aborted. The script must be accessible on the controller and executable. Example: replication_pre_failover_script=/usr/local/bin/pre_failover_script.sh.
replication_post_failover_script=<path> Path to the failover script on ClusterControl node. This script executes after the failover happened. If the script returns non-zero a Warning will be written in the job log. The script must be accessible on the controller and executable. Example: replication_post_failover_script=/usr/local/bin/post_failover_script.sh.
replication_post_unsuccessful_failover_script=<path> Path to the script on ClusterControl node. This script is executed after the failover attempt failed. If the script returns non-zero a Warning will be written in the job log. The script must be accessible on the controller and executable. Example: replication_post_unsuccessful_failover_script=post_fail_failover_script.sh.
replication_failed_reslave_failover_script=<path> Path to the script on ClusterControl node. This script is executed after that a new master has been promoted and if the reslaving of the slaves to the new master fails. If the script returns non-zero a Warning will be written in the job log. The script must be accessible on the controller and executable. Example: replication_failed_reslave_failover_script=/usr/local/bin/fail_reslave_failover_script.sh.
replication_pre_switchover_script=<path> Path to the switchover script on ClusterControl node. This script executes before the switchover happens. If the script returns non-zero it will force the switchover to fail. If the script is defined but not found, the switchover will be aborted. The script must be accessible on the controller and executable. Example: replication_pre_switchover_script=/usr/local/bin/pre_switchover_failover_script.sh.
replication_post_switchover_script=<path> Path to the switchover script on ClusterControl node. This script executes after the switchover happened. If the script returns non-zero a Warning will be written in the job log. The script must be accessible on the controller and executable. Example: replication_post_switchover_script=/usr/local/bin/post_switchover_failover_script.sh.
replication_check_external_bf_failover=<boolean integer> Before attempting a failover, perform extended checks by checking the slave status to detect if the master is truly down, and also check if ProxySQL (if installed) can still see the master. If the master is detected to be functioning, then no failover will be performed. Default is 0 (false) meaning the checks are disabled. 1 means enable. Example: replication_check_external_bf_failover=0.
replication_check_binlog_filtration_bf_failover=<boolean integer> Before attempting a failover, verify filtration (binlog_do/ignore_db) and replication_* are identically configured on the candidate and the slaves. Default is 0 (false) meaning the checks are disabled. 1 means enable. Example: replication_check_binlog_filtration_bf_failover=1.
replication_failover_events=<boolean integer> Automatically failover events (SLAVESIDE_DISABLED) and enable the event_scheduler after a replication failover/switchover action. Default is disabled. Enabled by setting it to 1. Example: replication_failover_events=1.
auto_manage_readonly=<boolean integer> Enable/Disable automatic management of the MySQL server read_only variable. Default is 1 (true), which means ClusterControl will set the read_only=ON if the MySQL replication role is slave. Default is 1 (enabled). Example: auto_manage_readonly=0.
schema_change_detection_address=<string> This option must be defined to use “Operational Report for Schema Change”. Creating a report of thousands of database objects (schemas, tables etc) will take some time (about 5-10 minutes) depending on the hardware. It’s recommended to configure a specific host to run this job for example on a replication slave or an asynchronous slave connected to e.g a Galera or Group Replication Cluster. For NDB this option should be set to a MySQL server used for admin purposes. Example: schema_change_detection_address=192.168.111.53.
schema_change_detection_pause_time_ms=<integer> Throttle the detection process by pausing every this value, in milliseconds. For example, if defined as 3000, ClusterControl will pause the operation for every 3 seconds. Example: schema_change_detection_pause_time_ms=3000.
schema_change_detection_databases=<string> Comma separated string of database names and also supports wildcards. For example ‘DB%’, will evaluate all database starting with “DB”. Example: schema_change_detection_databases=mydb%,shops_db,mymonitoring.
datanode_addresses=<string> Exclusive for MySQL Cluster. Comma separated list of data node hostnames or IP addresses. Example: datanode_addresses=192.168.0.41,192.168.0.42.
mgmnode_addresses=<string> Exclusive for MySQL Cluster. Comma separated list of management node hostnames or IP addresses. Example: mgmnode_addresses=192.168.0.51,192.168.0.52.
ndb_connectstring=<string> Exclusive for MySQL Cluster. NDB connection string for the cluster. Example: ndb_connectstring=192.168.0.51:1186,192.168.0.52:1186.
ndb_binary=<string> Exclusive for MySQL Cluster. NDB binary for data node. Supported values are ndbd or ndbmtd. Example: ndb_binary=ndbmtd.
ndbd_datadir=<path> Exclusive for MySQL Cluster. The datadir of the NDBD nodes. Example: ndbd_datadir=/var/lib/mysqlcluster.
mgmd_datadir=<path> Exclusive for MySQL Cluster. The datadir of the NDB MGMD nodes. Example: mgmd_datadir=/var/lib/mysql.
db_configdir=<string> Exclusive for MySQL Cluster. Directory where configuration files (my.cnf/config.ini) of the cluster are stored. Example: db_configdir=/etc/mysql.

6.1.4.10. MongoDB Nodes

Option Description
mongodb_server_addresses=<string> Comma separated list of MongoDB shard or replica IP addresses with port. Example: mongodb_server_addresses=192.168.0.11:27017,192.168.0.12:27017,192.168.0.13:27017.
mongoarbiter_server_addresses=<string> Comma separated list of MongoDB arbiter IP addresses with port. Example: mongoarbiter_server_addresses=192.168.0.11:27019,192.168.0.12:27019,192.168.0.13:27019.
mongocfg_server_addresses=<string> Comma separated list of MongoDB config server IP addresses with port. Example: mongocfg_server_addresses=192.168.0.11:27019,192.168.0.12:27019,192.168.0.13:27019.
mongos_server_addresses=<string> Comma separated list of MongoDB mongos IP addresses with port. Example: mongos_server_addresses=192.168.0.11:27017,192.168.0.12:27017,192.168.0.13:27017.
mongodb_basedir=<path> Location MongoDB base directory to find mongodb client related binaries. Example: mongodb_basedir=/usr.
mongodb_user=<string> MongoDB admin/root username. Example: mongodb_user=root.
mongodb_password=<string> Password for mongodb_user. Example: mongodb_password='kPo123^^#*'.
mongodb_authdb=<string> The database containing user information to use for authentication. Default is admin. Example: mongodb_authdb=admin.
mongodb_cluster_key=<path> The cluster’s nodes authenticating to each other using this key. Example: mongodb_cluster_key=/etc/repl.key.

6.1.4.11. PostgreSQL/TimescaleDB Nodes

Option Description
postgresql_server_addresses=<string> Comma separated list of PostgreSQL instances with port. Example: postgresql_server_addresses=192.168.10.100:5432,192.168.10.101:5432. Other alias: postgre_server_addresses.
postgresql_user=<string> The PostgreSQL admin user name. Default is postgres. Example: postgresql_user=postgres. Other alias: postgre_user.
postgresql_password=<string> The PostgreSQL admin password. Example: postgresql_password='p4s$^#0rd123'. Other alias: postgre_password.
wal_retention_hours=<boolean integer> Retention hours to erase old WAL archive logs for PITR. Default is 0, means WAL archive logs are kept forever. Other alias: pitr_retention_hours.
auto_manage_readonly=<boolean integer> Enable/Disable automatic management of the PostgreSQL server transaction_read_only variable. Default is 1 (true), which means ClusterControl will set the transaction_read_only=ON if the PostgreSQL replication role is slave. Default is 1 (enabled). Example: auto_manage_readonly=0.
replication_auto_rebuild_slave=<boolean integer> If the SQL THREAD is stopped and error code is non-zero then the slave will be automatically rebuilt. 1 means enable, default is 0 (false). Example: replication_auto_rebuild_slave=1.
repl_user=<string> The PostgreSQL replication user. Example: repl_user=repluser.
repl_password=<string> Password for repl_user. Example: repl_password='ZG04Z2Jnk0MUWAZK'.
replication_failover_whitelist=<string> Comma separated list of PostgreSQL slaves which should be used as potential master candidates. If no server on the whitelist is available (up/connected) the failover will fail. If this variable is set, only those hosts will be considered. This parameter takes precedence over replication_failover_blacklist. Example: replication_failover_whitelist=192.168.1.11,192.168.1.12.
replication_failover_blacklist=<string> Comma separated list of PostgreSQL slaves which will never be considered a master candidate. You can use it to list slaves that are used for backups or analytical queries. If the hardware varies between slaves, you may want to put here the slaves which use slower hardware. replication_failover_whitelist takes precedence over this parameter if it is set. Example: replication_failover_blacklist=192.168.1.101,192.168.1.102.

6.1.5. Management and Deployment Operations

For management and deployment jobs, ClusterControl performs these jobs by pushing remote commands via SSH to the target node. Users only need to install the CMON controller package on the ClusterControl host, and make sure that passwordless SSH and the CMON database user GRANTs are properly set up on each of the managed hosts. This mechanism requires no agent which simplifies the setup of the whole infrastructure. Agent-based mode of operation is only supported for monitoring jobs, as described in the next section.

6.1.6. Monitoring Operations

Generally, ClusterControl performs its monitoring, alerting and trending duties by using the following 4 ways:

  1. SSH - Host metrics collection using SSH library.
  2. Prometheus - Host and database metrics collection using Prometheus server and exporters.
  3. Database client - Database metrics collection using the CMON database client library.
  4. Advisor - Mini programs written using ClusterControl DSL and running within ClusterControl itself, for monitoring, tuning and alerting purposes.

Starting from version 1.7.0, ClusterControl supports two methods of monitoring operation:

  1. Agentless monitoring (default).
  2. Agent-based monitoring with Prometheus.

Monitoring operation method is a non-global configuration and bounded per cluster. This allows you to have two different database clusters configured with two different monitoring methods simultaneously. For example, Cluster A uses SSH sampling while Cluster B uses Prometheus agent-based setup to gather host monitoring data.

Regardless of the monitoring method chosen, database and load balancer (except HAProxy) metrics are still being sampled by CMON’s database client library agentlessly and store inside CMON Database for reporting (alarms, notifications, operational reports) and accurate management decision for critical operations like failover and recovery. Having said that, with agent-based monitoring, ClusterControl does not use SSH to sample host metrics which can be very excessive in some environments.

Caution

ClusterControl allows you to switch between agentless and agent-based monitoring per cluster. However, you will lose the monitoring data each time you are doing this.

6.1.6.1. Agentless Monitoring

For host and load balancer stats collection, ClusterControl executes this task via SSH with super-user privilege. Therefore, passwordless SSH with super-user privilege is vital, to allow ClusterControl to run the necessary commands remotely with proper escalation. With this pull approach, there are a couple of advantages as compared to agent-based monitoring method:

  • Agentless - There is no need for agent to be installed, configured and maintained.
  • Unifying the management and monitoring configuration - SSH can be used to pull monitoring metrics or push management jobs on the target nodes.
  • Simplify the deployment - The only requirement is proper passwordless SSH setup and that’s it. SSH is also very secure and encrypted.
  • Centralized setup - One ClusterControl server can manage multiple servers and clusters, provided it has sufficient resources.

However, there are also drawbacks with the agentless monitoring approach, a.k.a pull mechanism:

  • The monitoring data is accurate only from ClusterControl perspective. For example, if there is a network glitch and ClusterControl loses communication to the monitored host, the sampling will be skipped until the next available cycle.
  • For high granularity monitoring, there will be network overhead due to increase sampling rate where ClusterControl needs to establish more connections to every target hosts.
  • ClusterControl will keep on attempting to re-establish connection to the target node, because it has no agent to do this on its behalf.
  • Redundant data sampling if you have more than one ClusterControl server monitoring a cluster, since each ClusterControl server has to pull the monitoring data for itself.

The above points are the reasons we introduced agent-based monitoring, as described in the next section.

6.1.6.2. Agent-based Monitoring

Starting from version 1.7.0, ClusterControl introduced an agent-based monitoring integration with Prometheus. Other operations like management, scaling and deployment are still performed through agentless approach as described in the Management and Deployment Operations. Agent-based monitoring can eliminate excessive SSH connections to the monitored hosts and offload the monitoring jobs to another dedicated monitoring system like Prometheus.

With agent-based configuration, you can use a set of new dashboards that use Prometheus as the data source and give access to its flexible query language and multi-dimensional data model with time series data identified by metric name and key/value pairs. Simply said, in this configuration, ClusterControl integrates with Prometheus to retrieve the collected monitoring data and visualize them in the ClusterControl UI, just like a GUI client for Prometheus. ClusterControl also connects to the exporter via HTTP GET and POST methods to determine the process state for process management purposes. For the list of Prometheus exporters, see Monitoring Tools.

One Prometheus data source can be shared among multiple clusters within ClusterControl. You have the options to deploy a new Prometheus server or import an existing Prometheus server, under ClusterControl > Dashboards > Enable Agent Based Monitoring.

Attention

Importing external Prometheus host (which was not deployed by ClusterControl) is not supported at the moment due to possibility of incompatible data structures exposed by Prometheus exporters.

6.1.6.3. Monitoring Tools

6.1.6.3.1. Agentless

For agentless monitoring mode, ClusterControl monitoring duty only requires OpenSSH server package on the monitored hosts. ClusterControl uses libssh client library to collect host metrics for the monitored hosts - CPU, memory, disk usage, network, disk IO, process, etc. OpenSSH client package is required on the ClusterControl host only for setting up passwordless SSH and debugging purposes. Other SSH implementations like Dropbear and TinySSH are not supported.

6.1.6.3.2. Agent-based

For agent-based monitoring mode, ClusterControl requires a Prometheus server on port 9090 to be running, and all monitored nodes to be configured with at least three exporters (depending on the node’s role):

  1. Process exporter (port 9011)

  2. Node/system metrics exporter (port 9100)

  3. Database or application exporters:

On every monitored host, ClusterControl will configure and daemonize exporter process using a program called daemon. Thus, ClusterControl host is recommended to have an Internet connection to install necessary packages and automate the Prometheus deployment. For offline installation, the packages must be pre-downloaded into /var/cache/cmon/packages on ClusterControl node. For the list of required packages and links, please refer to /usr/share/cmon/templates/packages.conf. Apart from Prometheus scrape process, ClusterControl also connects to the process exporter via HTTP calls directly to determine the process state of the node. No sampling via SSH is involved in this process.

Note

With agent-based monitoring, ClusterControl depends on a working Prometheus for accurate reporting on management and monitoring data. Therefore, Prometheus and exporter processes are managed by internal process manager thread. A non-working Prometheus will have a significant impact on CMON process.

Since ClusterControl 1.7.3 allows multi-instance per single host, it will automatically configure a different exporter port if there are more than one same processes to monitor to avoid port conflict by incrementing the port number for every instance. Supposed you have two ProxySQL instances deployed by ClusterControl, and you would like to monitor them both via Prometheus. ClusterControl will configure the first ProxySQL’s exporter to be running on the default port, 42004 while the second ProxySQL’s exporter port will be configured with port 42005, incremented by 1.

The collector flags are configured based on the node’s role, as shown in the following table (some exporters do not use collector flags):

Exporter Collector Flags
mysqld_exporter
  • collect.info_schema.processlist
  • collect.info_schema.tables
  • collect.info_schema.innodb_metrics
  • collect.global_status
  • collect.global_variables
  • collect.slave_status
  • collect.perf_schema.tablelocks
  • collect.perf_schema.eventswaits
  • collect.perf_schema.file_events
  • collect.perf_schema.file_instances
  • collect.binlog_size
  • collect.perf_schema.tableiowaits
  • collect.perf_schema.indexiowaits
  • collect.info_schema.tablestats
node_exporter arp, bcache, bonding, conntrack, cpu, diskstats, edac, entropy, filefd, filesystem, hwmon, infiniband, ipvs, loadavg, mdadm, meminfo, netdev, netstat, nfs, nfsd, sockstat, stat, textfile, time, timex, uname, vmstat, wifi, xfs, zfs
6.1.6.3.3. Database Client Libraries

When gathering the database stats and metrics, regardless of the monitoring operation method, ClusterControl Controller (CMON) connects to the database server directly via database client libraries - libmysqlclient (MySQL/MariaDB and ProxySQL), libpq (PostgreSQL) and libmongoc (MongoDB). That is why it’s crucial to setup proper privileges for ClusterControl server from database servers perspective. For MySQL-based clusters, ClusterControl requires database user “cmon” while for other databases, any username can be used for monitoring, as long as it is granted with super-user privileges. Most of the time, ClusterControl will setup the required privileges (or use the specified database user) automatically during the cluster import or cluster deployment stage.

6.1.6.3.4. Load Balancers

For load balancers, ClusterControl requires the following additional tools:

  • Maxadmin on the MariaDB MaxScale server.
  • netcat and/or socat on the HAProxy server to connect to HAProxy socket file.
  • ProxySQL requires mysql client on the ProxySQL server.

6.1.6.4. Agentless vs Agent-based Architecture

The following diagram summarizes both host and database monitoring processes executed by ClusterControl using libssh and database client libraries (agentless approach):

_images/cc_monitoring_agentless_171.png

The following diagram summarizes both host and database monitoring processes executed by ClusterControl using Prometheus and database client libraries (agent-based approach):

_images/cc_monitoring_agent_based_171.png

6.1.6.5. Timeouts and Intervals

ClusterControl Controller (CMON) is a multi-threaded process. For agentless monitoring, ClusterControl Controller sampling thread connects via SSH to each monitored host once and maintain persistent connection (hence, no timeout) until the host drops or disconnects it when sampling host stats. It may establish more connections depending on the jobs assigned to the host, since most of the management jobs run in their own thread. For example, cluster recovery runs on the recovery thread, Advisor execution runs on a cron-thread, as well as process monitoring which runs on process collector thread.

For agent-based monitoring, the Scrape Interval and Data Retention period are depending on the Prometheus settings.

ClusterControl monitoring thread performs the following sampling operations in the following interval:

Metrics Interval
MySQL query/status/variables Every second
Process collection (/proc) Every 10 seconds
Server detection Every 10 seconds
Host (/proc, /sys) Every 30 seconds (configurable via host_stats_collection_interval)
Database (PostgreSQL and MongoDB only) Every 30 seconds (configurable via db_stats_collection_interval)
Database schema Every 3 hours (configurable via db_schema_stats_collection_interval)
Load balancer Every 15 seconds (configurable via lb_stats_collection_interval)

The Advisors (imperative scripts), which can be created, compiled, tested and scheduled directly from ClusterControl UI, under Manage -> Developer Studio, can make use of SSH and database client libraries for monitoring, data processing and alerting within ClusterControl domain with the following restrictions:

  • 5 seconds of hard time limit for SSH execution,
  • 10 seconds of default time limit for database connection, configurable via net_read_timeout, net_write_timeout, connect_timeout in CMON configuration file,
  • 60 seconds of total script execution time limit before CMON ungracefully aborts it.

For short-interval monitoring data like MySQL queries and status, data are stored directly into CMON database. While for long-interval monitoring data like weekly/monthly/yearly data points are aggregated every 60 seconds and stored in memory for 10 minutes. These behaviours are not configurable due to the architecture design.

6.1.7. CMON Database

The CMON database is the persistent store for all monitoring data collected from the managed nodes, as well as all ClusterControl meta data (e.g. what jobs there are in the queue, backup schedules, backup statuses, etc.). The CMON Controller requires a MySQL database running on mysql_hostname as defined in CMON configuration file. The database name and user is ‘cmon’ and is immutable. The CMON database dump files are shipped together with the CMON Controller package and can be found under /usr/share/cmon once it is installed.

MySQL user ‘cmon’ needs to have proper access to CMON database by performing following grant:

Grant all privileges to ‘cmon’ at hostname value (as defined in CMON configuration file) on ClusterControl host:

GRANT ALL PRIVILEGES ON *.* TO 'cmon'@'{hostname}' IDENTIFIED BY '{mysql_password}' WITH GRANT OPTION;

Grant all privileges for ‘cmon’ at 127.0.0.1 on ClusterControl host:

GRANT ALL PRIVILEGES ON *.* TO 'cmon'@'127.0.0.1' IDENTIFIED BY '{mysql_password}' WITH GRANT OPTION;

On every managed database server, grant all privileges to cmon at controller’s hostname value (as defined in CMON configuration file) on each of the managed database host:

GRANT ALL PRIVILEGES ON *.* TO 'cmon'@'{hostname}' IDENTIFIED BY '{mysql_password}' WITH GRANT OPTION;

If one deploys a cluster using ClusterControl deployment wizard, the above GRANTs will be configured automatically.

6.2. ClusterControl UI

ClusterControl UI provides a modern web user interface to visualize the cluster and perform tasks like backup scheduling, configuration changes, adding nodes, rolling upgrades, etc. It requires a MySQL database called ‘dcps’, to store cluster information, users, roles and settings. It interacts with CMON controller via remote procedure call (RPC) on port 9500 and 9501 for TLS (default).

ClusterControl UI page can be accessed through following URL:

https://ClusterControl IP address or hostname/clustercontrol

The ClusterControl UI is running on Apache and located under /var/www/html/clustercontrol (RedHat/CentOS/Ubuntu >14.04) or /var/www/clustercontrol (Debian <8/Ubuntu <14.04). The web server must support rule-based rewrite engine and must be able to follow symlinks.

Please refer to ClusterControl UI for the functionalities available in the ClusterControl UI.

6.3. ClusterControl SSH

This optional package is introduced in ClusterControl v1.4.2. This module provides the ability to connect to SSH console to any of your cluster hosts directly via ClusterControl UI. This can be very useful if you need to quickly log into a database server and access the command line. The package installs a binary called cmon-ssh located under /usr/sbin directory and by default listens to port 9511 on the ClusterControl node. It interacts directly with the target host via SSH protocol using the credential (os_user and ssh_identity) configured when deploying or importing the cluster into ClusterControl.

The SSH module need to be enabled in order to use the feature. If the package is installed directly via package manager, the required steps are configured automatically. The steps are:

  1. Enable the SSH module inside clustercontrol/bootstrap.php file:
define('SSH_ENABLED', true);
  1. Set up the RewriteRule inside Apache configuration file (above the <Directory/> definitions):
# ClusterControl SSH
RewriteEngine On
RewriteRule ^/clustercontrol/ssh/term$ /clustercontrol/ssh/term/ [R=301]
RewriteRule ^/clustercontrol/ssh/term/ws/(.*)$ ws://127.0.0.1:9511/ws/$1 [P,L]
RewriteRule ^/clustercontrol/ssh/term/(.*)$ http://127.0.0.1:9511/$1 [P]
  1. Enable the following Apache modules:
a2enmod proxy proxy_http proxy_wstunnel

Communication is based on HTTPS, so it is possible to access your servers from behind a firewall that restricts Internet access to only port 443. Access to WebSSH is configurable by the ClusterControl admin through the GUI.

Warning

ClusterControl does not provide extra layers of authentication and authorization when accessing the cluster from web-based SSH terminal. User who has access to the cluster in the ClusterControl UI may capable of accessing the terminal as a privileged user. Use Access Control to limit them.

6.4. ClusterControl Notifications

This optional package is introduced in ClusterControl v1.4.2, deprecating the previous version of ClusterControl NodeJS package (which served the same purpose). Alarms and events can now easily be sent to incident management services like PagerDuty, VictorOps and OpsGenie. You can also run any command available in the ClusterControl CLI from your CCBot-enabled chat services like Slack and Telegram. Additionally, it provides a generic web hook if you want to integrate with other services to act on status changes in your clusters. The direct connections with these popular incident communication services allow you to customize how you are alerted from ClusterControl when something goes wrong with your database environments.

The package installs a binary called cmon-events located under /usr/sbin directory and by default listens to port 9510 on the ClusterControl node.

Additional resources on setting up integration with third-party tools are listed below:

6.5. ClusterControl Cloud

This optional package is introduced in ClusterControl v1.5. The package name is clustercontrol-cloud and it provides an UI extension in ClusterControl under Integrations. After adding the cloud provider credentials, you can do basic instance operations directly from ClusterControl UI like listing, starting, stopping and deleting the existing instances. See Integrations for details.

This package installs the following new files:
  • /etc/cron.d/cmon-cloud - Cron job to monitor cmon-cloud process.
  • /etc/rc.d/init.d/cmon-cloud - Sysvinit script.
  • /etc/systemd/system/cmon-cloud.service - Systemd unit file.
  • /usr/sbin/cmon-cloud - The executable binary file.

By default, this service will use port 9518 on localhost interface. Cloud credentials are stored under /var/lib/cmon/ with permission set to root only.

Note

ClusterControl Cloud is a reintroduction of a feature called ‘Service Providers’, available in ClusterControl 1.4 and older.

6.6. ClusterControl Cloud File Manager (CLUD)

This optional package is a complementary of ClusterControl Cloud and is introduced in ClusterControl v1.5. The package name is clustercontrol-clud (abbreviation of CLoud Upload Download) and is the command line interface for ClusterControl to interact with the cloud providers when uploading and downloading backups. You can think of it as a cloud file manager. No extra configuration is required.

This package installs the following file:
  • /usr/sbin/clud - The executable binary file.

6.6.1. Usage

The general synopsis to execute commands using clud is:

clud {global options} command {command option} {arguments...}

Command

Name, shorthand Description
upload Upload a file to the cloud.
download Download a file from the cloud.
rm Delete a file on the cloud.
clouds List supported cloud providers.
help, h Shows a list of commands or help for one command.

Global Options

Name, shorthand Description
−−cloud Use this cloud for the action.
−−service Use this service within the cloud.
−−credentials Raw credentials JSON.
−−credentials-file Take credentials from this file instead of using STDIN input.
−−auto-create-bucket Use this flag if the bucket should be auto-created if not exists.
−−help, -h Show help.
−−version Print the version.

6.7. ClusterControl CLI

Also known as s9s-tools, this optional package is introduced in ClusterControl version 1.4.1, which contains a binary called s9s. It is a command line tool to interact, control and manage database clusters using the ClusterControl Database Platform. Starting from version 1.4.1, the installer script will automatically install this package on the ClusterControl node. You can also install it on another computer or workstation to manage the database cluster remotely. Communication between this client and CMON controller is encrypted and secure through TLS. This command line project is open source and publicly available at GitHub.

ClusterControl CLI opens a new door for cluster automation where you can easily integrate it with existing deployment automation tools like Ansible, Puppet, Chef or Salt. The following list shows the supported features at the moment:

  • Deploy and manage database clusters:
    • MySQL
    • PostgreSQL
    • MongoDB (Replica Set)
  • Basic monitoring features:
    • Status of nodes and clusters.
    • Cluster properties can be extracted.
    • Gives detailed enough information about your clusters.
  • Management features:
    • Create clusters.
    • Add existing clusters.
    • Stop or start clusters.
    • Add or remove nodes.
    • Restart nodes in the cluster.
    • Create database users (CREATE USER, GRANT privileges to user).
    • Create load balancers (HAProxy and ProxySQL are supported)
    • Create and restore backups.
    • Maintenance mode.
    • Configuration changes of db nodes.

The command line tool is invoked by executing a binary called s9s. The commands are basically JSON messages being sent over to the ClusterControl Controller (CMON) RPC interface. Communication between the s9s (the command line tool) and the cmon process (ClusterControl Controller) is encrypted using TLS and requires the port 9501 to be opened on controller and the client host.

6.7.1. Installation

We have built an installer script for s9s-tools available at http://repo.severalnines.com/s9s-tools/install-s9s-tools.sh.

On ClusterControl host (or any client host):

$ wget http://repo.severalnines.com/s9s-tools/install-s9s-tools.sh
$ chmod 755 install-s9s-tools.sh
$ ./install-s9s-tools.sh

If you would like to install it manually, please refer to the next section, Package Manager (yum/apt).

6.7.1.1. Package Manager (yum/apt)

The package list is available at s9s-tools repository page.

6.7.1.1.1. RHEL/CentOS

The repository file for each distribution can be downloaded directly from:

Installation steps are straight-forward:

# CentOS 7
$ wget http://repo.severalnines.com/s9s-tools/CentOS_7/s9s-tools.repo -P /etc/yum.repos.d
$ yum install s9s-tools
$ s9s --help
6.7.1.1.2. Debian/Ubuntu DEB Repositories

The repository file for each distribution can be downloaded directly from:

To install, one would do:

$ wget -qO - http://repo.severalnines.com/s9s-tools/$(lsb_release -sc)/Release.key | sudo apt-key add -
$ echo "deb http://repo.severalnines.com/s9s-tools/$(lsb_release -sc)/ ./" | sudo tee /etc/apt/sources.list.d/s9s-tools.list
$ sudo apt-get update
$ sudo apt-get install s9s-tools
$ s9s --help

6.7.1.2. Compile From Source

To build from source, you may require additional packages and tools to be installed:

  1. Get the source code from Github:
$ git clone https://github.com/severalnines/s9s-tools.git
  1. Navigate to the source code directory:
$ cd s9s-tools
  1. You may need to install development packages such as C/C++ compiler, autotools, openssl-devel etc:
# RHEL/CentOS
$ yum groupinstall "Development Tools"
$ yum install automake git openssl-devel

# Ubuntu/Debian
$ sudo apt-get install build-essential automake git libssl-dev byacc flex bison
  1. Compile the source code:
$ ./autogen.sh
$ ./configure
$ make
$ make install
$ s9s --help

It is possible to build the s9s command line client on Linux and Mac OS/X.

6.7.2. Configuration

The first thing that must be done is to create a user that is allowed to connect to and use the controller. Communication between the s9s command line client and the controller (the cmon process) is encrypted using TLS on port 9501. A public and private RSA key pair associated with a username is used to encrypt the communication. The s9s command line client is responsible to setup the user and the required private and public key.

The command line client can be located on the same server as the controller (localhost communication) or on a remote server. The configuration differs depending on the location - localhost or remote access, and both cases are covered below.

6.7.2.1. Localhost Access

SSH into the controller and then let us create a user called ‘dba’ that is allowed to access the controller. This will create the first user.

Important

All users have the rights to perform all operations on the managed clusters. There is no access control or roles implemented at the moment. However, the user must be an authenticated user to be able to access the controller.

$ s9s user --create --generate-key --controller="https://localhost:9501" --group=admins dba
Grant user 'dba' succeeded.

If this is the first time you use the s9s client, then a new directory has been created in $HOME/.s9s/ storing the private/public key and a configuration file.

Let us see what has been created:

$ ls $HOME/.s9s/
dba.key  dba.pub  s9s.conf

Viewing the configuration file we will see:

[global]
cmon_user              = dba
# controller_host_name = localhost
# controller_port      = 9500
# rpc_tls              = false

Now we need to set the controller_host_name and controller_port, and the rpc_tls so the file looks:

[global]
cmon_user            = dba
controller_host_name = localhost
controller_port      = 9501
rpc_tls              = true
# controller = "https://localhost:9501"

In order to verify it is working you can list the available clusters:

$ s9s cluster --list
cluster_1 cluster_2 cluster_3

If the authentication fails you will see messages like:

Authentication failed: User 'dba' is not found.

The above means that the user has not been created. Or if there is a problem connecting to cmon:

Authentication failed: Connect to localhost:9501 failed: "{{ Failure message }}".

In this case, double check the ~/.s9s/s9s.conf file and check that cmon is started with TLS:

$ sudo grep -i tls /var/log/cmon.log
2016-11-28 15:00:31 : (INFO) Server started at tls://127.0.0.1:9501

And also:

$ sudo netstat -atp | grep 9501
tcp        0      0 localhost:9501          *:*                     LISTEN      22096/cmon

To view the users, and list which is the currently used user (marked with an “A” - a short form of “Authenticated”):

$ s9s user --list --long
A ID UNAME      GNAME  EMAIL REALNAME
-  1 system     admins -     System User
-  2 nobody     nobody -     -
A  3 dba        users  -     -
-  4 remote_dba users  -     -

The ‘nobody’ user is a legacy user. No one should ever see a job issued by the user ‘nobody’. The ‘system’ user is the ClusterControl server itself creating internal jobs (e.g internal cron jobs).

6.7.2.2. Remote Access

The steps to setup the s9s command line client for remote access is similar as for localhost, except:

  • The s9s command line client must exist on the remote server
  • The controller (cmon) must be accepting TLS connections from the remote server.
  • The remote server can connect to the controller with key-based authentication (no password). This is required only during the user creation private/public key setup.
  1. Setup the bind address from cmon process as follow:
$ vi /etc/init.d/cmon

Locate the line:

RPC_BIND_ADDRESSES=""

And change to:

RPC_BIND_ADDRESSES="127.0.0.1,10.0.1.12"

Here we assume the public IP address of the controller is 10.0.1.12.

Attention

Naturally, you should lock down this IP with firewall rules only allowing access from the remote servers you wish to access the controller from.

  1. Restart the controller and check the log:
$ service cmon restart # sysvinit
$ systemctl restart cmon # systemd
  1. Verify the ClusterControl Controller is listening to the configured IP address on port 9501:
$ cat /var/log/cmon.log | grep -i tls
2016-11-29 12:34:04 : (INFO) Server started at tls://127.0.0.1:9501
2016-11-29 12:34:04 : (INFO) Server started at tls://10.0.1.12:9501
  1. On the remote server/computer, we have to enable key-based authentication and create a user called ‘remote_dba’. Create a system user called ‘remote_dba’:
$ useradd -m remote_dba
  1. As the current user (root or sudoer for example) in the remote server, setup a passwordless SSH to the ClusterControl host. Generate one SSH key if you don’t have one:
$ ssh-keygen -t rsa # press 'Enter' for all prompts

Copy the SSH public key to the ClusterControl Controller host, for example 10.0.1.12:

$ ssh-copy-id [email protected]
  1. Create the s9s client user:
$ s9s user --generate-key --create --group=admins --controller="https://10.0.1.12:9501" remote_dba
Warning: Permanently added '10.0.1.12' (ECDSA) to the list of known hosts.
Connection to 10.0.1.12 closed.
Grant user 'remote_dba' succeeded.
  1. Ensure the config file located at ~/.s9s/s9s.conf looks like this (take note the IP of the controller may be different):
[global]
cmon_user            = remote_dba
controller_host_name = 10.0.1.12
controller_port      = 9501
rpc_tls              = true
  1. Finally, test the connection:
$ s9s cluster --list
cluster_1 cluster_2 cluster_3

For more details on the usage, see ClusterControl CLI.