blog

MySQL Load Balancing with ProxySQL – An Overview

Krzysztof Ksiazek

Published: March 7, 2016
Last Updated: May 4, 2022

Year 2015 was a good year for proxies in MySQL environments. First, MariaDB released a GA version of MaxScale, a new proxy that understands MySQL Replication and Galera topologies and supports read-write splitting. Later in the year, at Percona Live Europe in Amsterdam, ProxySQL was announced as GA by its creator, René Cannaò.

ProxySQL was created for DBAs by René, himself a DBA trying to solve issues when working with complex replication topologies.

In this blog, we’ll give you an overview of ProxySQL, and show you how to install and configure it.

Installation of ProxySQL

Initial installation is pretty simple – all you need to do is grab the latest binaries from the site:

https://github.com/sysown/proxysql/releases/

and install them using rpm or dpkg:

root@cmon:~# wget https://github.com/sysown/proxysql/releases/download/v1.1.1-beta.6/proxysql_1.1.1-ubuntu14_amd64.deb
root@cmon:~# dpkg -i proxysql_1.1.1-ubuntu14_amd64.deb

Then, we can start the service.

root@cmon:~# service proxysql start
Starting ProxySQL: DONE!

At first start, ProxySQL loads an initial configuration which allows you to connect to it and configure it using MySQL client and an SQL-like language. By default, you will use user ‘admin’ with a password ‘admin’. It can be changed in the configuration file, /etc/proxysql.cnf, but it has to be done before the first start of the ProxySQL – this file is read only at the first start or when you explicitly initialize the config through:

service proxysql initial

Any further changes, including changing user and password for the CLI, will have to be made using the CLI and won’t be mirrored in the /etc/proxysql.cnf file.

It doesn’t mean there’s no way to store configuration persistently – ProxySQL uses a SQLite database for that – more details will follow later in this post.

By default, ProxySQL management interface should be listening on port 6032. You can use a standard MySQL command line client to connect to it:

root@cmon:~# mysql -u admin -padmin -h 127.0.0.1 -P6032

From here, we’ll be using an SQL-like language to configure the proxy and check its metrics. This may seem odd at the beginning, but for DBAs, using SQL to configure software should not pose a problem. Additionally, as ProxySQL is reachable using MySQL client, making changes in the configuration or checking the status of the proxy from external tools is rather easy – you can connect to the proxy in a same way you’d connect to MySQL.

Configuring ProxySQL

Before we proceed with configuration, we’ll need to understand another important concept. ProxySQL lets you introduce configuration changes at runtime. The exact behavior is slightly different to what you see in, e.g., MySQL – it’s more similar to what you’d see in a Cisco router or switch.

There are three ‘layers’ of configuration – first you edit some settings but this change is not applied immediately. You need to explicitly load them to runtime. Next, if you want to store them on persistent storage, you also have to do it explicitly. This sounds complex but it’s actually very useful – it helps to implement complex changes in the proxy configuration.

Let’s say you need to redesign your query routing rules and make some complex changes in how some of the variables are set. Usually it should be possible to apply those modifications one by one, but it may happen that such approach causes additional troubles. It’s similar to what we are used to in SQL – most of the queries can be executed on their own but from time to time, larger, more complex transactions are needed.

After we apply the change to runtime and verify it works correctly, you can to store them on disk, for safe, persistent store. If anything goes wrong, you can easily rollback the changes by loading the old configuration from disk and applying it to runtime.

Let’s take a closer look at the CLI and how the whole configuration looks like.

mysql> show databases;
+-----+---------+-------------------------------+
| seq | name    | file                          |
+-----+---------+-------------------------------+
| 0   | main    |                               |
| 2   | disk    | /var/lib/proxysql/proxysql.db |
| 3   | stats   |                               |
| 4   | monitor |                               |
+-----+---------+-------------------------------+
4 rows in set (0.00 sec)

As you can see, we have couple of ‘databases’ defined in the ProxySQL. ‘main’ contain current configuration – if you modify it by means of running SQL, changes will be shown here even before you apply them to the runtime configuration.

‘disk’ schema is a persistent, on-disk store. Settings that you see here had been explicitly stored ‘on disk’.

‘stats’ schema is generated ‘on the fly’, from the data collected and stored in the proxysql process. You can check here different statistics regarding the proxy itself but also traffic that’s going through it. Finally, we have the ‘monitor’ schema which presents data collected by the monitoring module. We will discuss exactly how ProxySQL monitors hosts later in this blog post. But before we do this, let’s go over all schemas available in ProxySQL and see what can we find there.

‘main’ and ‘disk’ schemas

Those schemas contain the configuration of ProxySQL and tables are the same in both of them. We won’t discuss every option in there, but we’ll do our best to cover the most important settings available.

mysql> show tables from main;
+------------------------------+
| tables                       |
+------------------------------+
| mysql_servers                |
| mysql_users                  |
| mysql_replication_hostgroups |
| mysql_query_rules            |
| global_variables             |
| mysql_collations             |
+------------------------------+
6 rows in set (0.00 sec)

mysql_servers table

mysql_servers – this is a table in which a user defines a list of backend nodes. Sample contents may look like below:

mysql> select * from mysql_serversG
*************************** 1. row ***************************
       hostgroup_id: 0
           hostname: 172.30.4.212
               port: 3306
             status: ONLINE
             weight: 1
        compression: 0
    max_connections: 1000
max_replication_lag: 0
*************************** 2. row ***************************
       hostgroup_id: 1
           hostname: 172.30.4.145
               port: 3306
             status: ONLINE
             weight: 1
        compression: 0
    max_connections: 1000
max_replication_lag: 0
*************************** 3. row ***************************
       hostgroup_id: 1
           hostname: 172.30.4.25
               port: 3306
             status: ONLINE
             weight: 1
        compression: 0
    max_connections: 1000
max_replication_lag: 0
3 rows in set (0.00 sec)

As you can see, there are couple of things you can set here. Some are obvious like hostname and port. Additionally you can set a status of the node, for example, mark it as offline and make sure it is not receiving any traffic.

Weight is also self-explanatory – you can define different weight on different nodes in order to route larger part of the traffic to some of the nodes which maybe have better, faster hardware. For a Galera Cluster it won’t make much of a difference (although even here you could benefit from routing more or less read-only traffic to some nodes only), but such setup is not uncommon when we use MySQL replication and, for example, one node is a dedicated backup host. In such case you may still want this node to be in rotation – if everything else fails, it’s still better to use it than to run out of backends, but as long as everything is ok, you don’t want it to take as much traffic as other slaves.

Compression – this feature allows you to enable compression between client and proxy. Similar to regular MySQL, compression has to be initiated by the client. Max connections and max replication lag – those two settings are also pretty clear. It’s all about maximum number of connections ProxySQL can direct to a particular backend. Maximum replication lag, on the other hand, tells us how badly a node can lag before it will be automatically shunned and removed out of rotation. For this feature to work, the monitoring module has to be configured correctly.

We left the first attribute, hostgroup_id, at the end – this is because it requires some additional explanation. ProxySQL is designed around the concept of hostgroups – different set of nodes which are somehow related. This can be anything, really. The simplest example would be master and slaves. Master would form one hostgroup, with only one node in it, slaves would form another hostgroup. But it can be much more complex. Maybe you want to combine a couple of slaves or nodes which serve a particular type of the traffic? Maybe you want to differentiate between regular slaves, reporting slaves and backup slaves? Hostgroups are there to make this possible. You can create different hostgroups and then route different type of traffic to them – we’ll get to that part later in this post.

mysql_users table

mysql> select * from mysql_usersG
*************************** 1. row ***************************
              username: sbtest
              password: sbtest
                active: 1
               use_ssl: 0
     default_hostgroup: 0
        default_schema: NULL
         schema_locked: 0
transaction_persistent: 0
          fast_forward: 0
               backend: 1
              frontend: 1
       max_connections: 10000
1 row in set (0.00 sec)

This table stores settings about users that are allowed to connect to the proxy and to the database. In general, when a client connects to a proxy, it authenticates against ProxySQL. ProxySQL, on the other hand, keeps multiple connections open to the backend and determines if one of them can be used for this particular user. If not, a new connection is created.

Most important settings here are, obviously, username and password. Next we can set the ‘active’ flag, determining if the account is treated as active or not. You can also enable SSL for the connection. Default hostgroup will come important later, when we’ll discuss the query routing. In general, this is a hostgroup where all traffic that did not match any routing rules will end. As long as it was generated by this particular user, that is. Default schema and max connections are the remaining settings we’d like to cover here – those are pretty self-explanatory. You can define a default schema for the connections created by a particular user – similar to how you do that when using MySQL client. Max connections is also pretty clear – it’s a maximum number of connections that given user can have open.

mysql_replication_hostgroups

mysql> show create table mysql_replication_hostgroupsG
*************************** 1. row ***************************
       table: mysql_replication_hostgroups
Create Table: CREATE TABLE mysql_replication_hostgroups (
writer_hostgroup INT CHECK (writer_hostgroup>=0) NOT NULL PRIMARY KEY,
reader_hostgroup INT NOT NULL CHECK (reader_hostgroup<>writer_hostgroup AND reader_hostgroup>0),
UNIQUE (reader_hostgroup))

This table can be used to define master – slave relationship between hostgroups. The idea behind it is that, under some conditions (two hostgroups only, one for a master and one for all slaves, read_only is used to differentiate master and slaves) it is possible for ProxySQL to start monitoring the read_only flag on those hosts. Through this, it can follow the topology changes and automatically introduce changes in the server definitions to mirror the topology. As we stated, you need to use the ‘read_only’ flag to mark master (read_only=0) and slaves (read_only=1). If you promote slave as a new master and change read_only flags accordingly, ProxySQL can detect such change and move old ‘master’ host to the ‘reader’ hostgroup while new master will be moved to the ‘writer’ hostgroup. This feature requires to have a correctly configured monitoring module and query rules set accordingly (send DML’s to the ‘writer’ hostgroup, SELECTs to the ‘reader’ hostgroup). We will discuss different failover scenarios in MySQL replication environment using ProxySQL in a follow-up post.

mysql_query_rules table

mysql> select * from mysql_query_rulesG
*************************** 1. row ***************************
              rule_id: 1
               active: 1
             username: NULL
           schemaname: NULL
               flagIN: 0
         match_digest: NULL
        match_pattern: ^SELECT.*FOR UPDATE$
 negate_match_pattern: 0
              flagOUT: NULL
      replace_pattern: NULL
destination_hostgroup: 0
            cache_ttl: NULL
            reconnect: NULL
              timeout: NULL
                delay: NULL
            error_msg: NULL
                apply: 1
*************************** 2. row ***************************
              rule_id: 2
               active: 1
             username: NULL
           schemaname: NULL
               flagIN: 0
         match_digest: NULL
        match_pattern: ^SELECT
 negate_match_pattern: 0
              flagOUT: NULL
      replace_pattern: NULL
destination_hostgroup: 1
            cache_ttl: 100
            reconnect: NULL
              timeout: NULL
                delay: NULL
            error_msg: NULL
                apply: 1
2 rows in set (0.01 sec)

This table contains a list of different rules which governs how a query will be routed. Column ‘rule_id’ is auto-incremented. The idea is that queries are being tested against rules in order of rule_id (which means in order they were created). This works similarly to how iptables rules do – a query is checked against ‘match_pattern’ for every rule starting from ‘rule_id’=1. If a query matches the pattern, it is routed to the ‘destination_hostgroup’. If not – it’s checked against another rule. If a query doesn’t match any rules, it’ll be routed to the hostgroup defined in ‘default_hostgroup’ for the user who issued the query (in mysql_user table).

Other settings worth mentioning are ‘replace_pattern’ – it allows you to replace a matched pattern by some other string. This allows user to modify queries and rewrite them into some other form – pretty useful when you are dealing with a query you can’t modify on the application side, yet you know it can be rewritten into a more optimal form. When talking about matching, there’s also ‘negate_match_pattern’ – when enabled, the rule will match every query but the ones which fit the matching pattern. The rule can be ‘active’ or not, it’s also possible to define ‘cache_ttl’ – a time in milliseconds for which ProxySQL will cache queries that match a given pattern. Queries can also be filtered by the user which issue them or by schema they connect to – we are talking about ‘USE myschema’ type of schema switch or default schema defined when starting a connection.

Our example is almost the most simplest one you can have – it implements read-write split. In the first rule we catch all SELECT … FOR UPDATE queries and route them to the hostgroup ‘0’ – we also have this hostgroup defined as a default one for our application user. Second rule, matching SELECT statements, send those queries to the hostgroup ‘1’. All queries which does not match the rules will be routed to the default hostgroup, ‘0’. So, in short – all SELECTs are routed to the hostgroup ‘1’ – our ‘read’ group. Remaining queries (INSERTs, UPDATEs, ALTERs etc), including SELECT … FOR UPDATE, are routed to the hostgroup ‘0’ – our ‘write’ group.

Remaining tables

There are two more tables in the ‘main’ and ‘disk’ schemas, which we are not going to cover in detail. One of them is ‘mysql_collations’ which lists collations supported by ProxySQL. Another one, ‘global_variables’ contains different variables in a key->value format which govern ProxySQL’s behavior. Any changes have to be introduced here, by updating relevant key and loading changes into runtime using ‘LOAD MYSQL VARIABLES TO RUNTIME’ or ‘LOAD ADMIN VARIABLES TO RUNTIME’ and save them to the persistent storage using ‘SAVE MYSQL VARIABLES TO DISK’ or ‘SAVE ADMIN VARIABLES TO DISK’. Which one to choose (MYSQL or ADMIN) depends on which variables you’ve changed. There are two types of variables – ones with a prefix of ‘admin-’ and ones with a prefix of ‘mysql-’. This governs which statement should be used to apply configuration changes.

Detailed guide on the ProxySQL variables is out of scope for this blog post – there are many things you can tweak here, starting from access credentials for ProxySQL management CLI to how it monitors hosts. We will discuss some of them in upcoming posts when we’ll be discussing the process of setting up ProxySQL to work with MySQL replication and Galera.

‘stats’ schema

This schema stores different types of statistical data related to the operations within ProxySQL.

mysql> show tables from stats;
+--------------------------------+
| tables                         |
+--------------------------------+
| stats_mysql_query_rules        |
| stats_mysql_commands_counters  |
| stats_mysql_processlist        |
| stats_mysql_connection_pool    |
| stats_mysql_query_digest       |
| stats_mysql_query_digest_reset |
| stats_mysql_global             |
+--------------------------------+
7 rows in set (0.01 sec)

We can see here statistics regarding query rules – how many times a given rule was used. MySQL counter statistics tells us how often a particular MySQL command has been executed and what the execution time looks like. MySQL processlist is another very useful tool – it gives you a view, similar to MySQL ‘SHOW PROCESSLIST’ command, with information about running queries, from which host the query originated, to which backend it was routed, how long it’s being executed. Another table, ‘stats_mysql_connection_pool’ gives us a list of the backends, their states and statistics – how many connections do we have open against each of them, how many errors happened on each backend, how many connections are free in the pool, how many queries have been executed, how much data has been exchanged from a backend node. ProxySQL provides you also with overall statistics about its operations – numbers of connections created, aborted and connected on both sides (client and backend), information about data transferred through the proxy, number of queries, rollbacks and so on.

Finally, we have a table called ‘stats_mysql_query_digest’, which contains data about queries – hostgroup where it was executed, digest hash, generic form of the query, how many times it was executed, when it was first executed, when was the last time a given query showed up in ProxySQL, summary time for all executions, maximum and minimum execution time. This data gives a DBA a nice, overall overview of the performance of his queries. It’s not something which could replace SQL reviews based on slow log and executed using pt-query-digest, but it should be more than enough to handle most of the typical cases and significantly helps with debugging performance problems.

ProxySQL allows you to flush the data located in this table – you can query stats_mysql_query_digest_reset table and statistics will be cleared after you get the result.

‘monitor’ schema

Monitor schema stores a couple of tables related to the monitor module. You can find here information about checks that the monitoring module performs – if it was able to connect to the backend node, what the network latency looked like. If you use MySQL replication and configured writer and reader hostgroups, you’ll see also data on read_only flag. There’s also a log covering slave lag, should you decide to shun some of the slaves after the lag crosses some threshold.

Monitoring in ProxySQL

We are reaching the end of this blog post, but before we conclude it, we’d like to discuss one rather important characteristic of ProxySQL. There are many ways to monitor MySQL’s health. Some applications use ‘ping’ to check the status of MySQL servers, this pattern is also used on the connector level. Some tools check if port 3306 is open, assuming that open port means that MySQL is up and running. Some tools use monitoring modules which check the state of MySQL on some predefined interval. All these methods have flaws, unfortunately. There’s always a period of time between ‘ping’ and the query itself (even if it’s a very short period) – in that time MySQL could have died. Open port doesn’t mean you can actually connect to the MySQL instance and execute a query. Monitoring modules usually have some kind of interval that they run on – every 1 second, every 0.1 second. Between subsequent runs, there’s always some time in which MySQL could have changed its state.

ProxySQL uses different approach. It does use monitoring module, but this module is not used for the purpose of checking the health of a backend node. It can be used to check some of the properties of the backend – how big is the slave lag or whether the read_only is enabled or not. The health, though, is checked on multiple stages during the query execution – if a connection can be established to a backend and the query executed correctly, the backend is alive. If it cannot be established for a defined period of time (1000 milliseconds by default – it’s defined by mysql-connect_timeout_server variable), or something else prevents the query execution (query got killed or backend crashed when query was still running), backend is treated as unhealthy. By default, five failed attempts result in a backend marked as ‘shunned’ and the query is routed to another backend in the hostgroup (if any). If there is no available backend or all of them are not available, ProxySQL will keep trying to execute a query for mysql-connect_timeout_server_max – by default it’s 10000 ms. If it hits this timeout, an error is returned to the application. Server which is shunned won’t receive any further traffic for a duration of mysql-shun_recovery_time_sec (10s by default). After this time, the backend is marked again as online and it takes part in the query routing. Status of the server is checked on all steps of the query execution, so it’s not that only new queries can mark backend node as shunned.

There are pros and cons of this approach. What’s great is that, as long as there’s a backend node available, the application won’t see any errors. The application connects to ProxySQL and the rest of the routing, along with problem solving and rerouting queries from failed nodes, happens in the background and is transparent to the application. As long as ProxySQL can find a host able to run a query and not spend more than 10s on it, there’s no error to the application.

On the cons side, when a node goes down, some time is needed before it gets marked as ‘shunned’ and traffic gets routed away from it. It means that for a duration of mysql-connect_timeout_server queries will hit a backend node which is down. They will, eventually, be rerouted to some other nodes but their latency will be increased up to mysql-connect_timeout_server milliseconds. It is disputable whether it’s better to rollback a transaction or accept increased latency – it depends on the query mix and how complex transactions are. Simple transactions with a single query would probably be better with retrying them immediately on a new host. Large, long and complex transactions would probably benefit from waiting a bit than being rolled back. On the other hand, timeouts are configurable so you can control the theoretical maximal increase in latency.

In this post we’ve gone through the ProxySQL setup and some of the features it provides. We’ve discussed some of the aspects of its configuration. In the next blog on ProxySQL, we are going to focus on setting up a MySQL replication topology and show how to monitor operations inside ProxySQL.

ClickHouse Schema Design and Data Modeling

Building a Modern Analytics Stack Around ClickHouse

Managing ClickHouse Resources in Multi-Tenant Environments

How to Migrate from SQL Server to PostgreSQL HA for Dockerized Apps