blog

What’s New in ProxySQL 2.0

Krzysztof Ksiazek

Published May 6, 2019

ProxySQL is one of the best proxies out there for MySQL. It introduced a great deal of options for database administrators. It made possible to shape the database traffic by delaying, caching or rewriting queries on the fly. It can also be used to create an environment in which failovers will not affect applications and will be transparent to them. We already covered the most important ProxySQL features in previous blog posts:

We even have a tutorial covering ProxySQL showing how it can be used in MySQL and MariaDB setups.

Quite recently ProxySQL 2.0.3 has been released, being a patch release for the 2.0 series. Bugs are being fixed and the 2.0 line seems to start getting the traction it deserves. In this blog post we would like to discuss major changes introduced in ProxySQL 2.0.

Causal Reads Using GTID

Everyone who had to deal with replication lag and struggled with read-after-write scenarios that are affected by the replication lag will definitely be very happy with this feature. So far, in MySQL replication environments, the only way to ensure causal reads was to read from the master (and it doesn’t matter if you use asynchronous or semisynchronous replication). Another option was to go for Galera, which had an option for enforcing causal reads since, like, always (first it used to be wsrep-causal-reads and now it is wsrep-sync-wait). Quite recently (in 8.0.14) MySQL Group replication got similar feature. Regular replication, though, on its own, cannot deal with this issue. Luckily, ProxySQL is here and it brings us an option to define on per-query rule basis with what hostgroup reads which match that query rule should be consistent. The implementation comes with ProxySQL binlog reader and it can work with ROW binlog format for MySQL 5.7 and newer. Only Oracle MySQL is supported due to lack of required functionality in MariaDB. This feature and its technical details have been explained on ProxySQL official blog.

SSL for Frontend Connections

ProxySQL always had support for backend SSL connection but it lacked SSL encryption for the connections coming from clients. This was not that big of a deal given the recommended deployment pattern was to collocate ProxySQL on application nodes and use a secure Unix socket to connect from the app to the proxy. This is still a recommendation, especially if you use ProxySQL for caching queries (Unix sockets are faster than TCP connection, even local ones and with cache it’s good to avoid introducing unnecessary latency). What’s good is that with ProxySQL 2.0 there’s a choice now as it introduced SSL support for incoming connections. You can easily enable it by setting mysql-have_ssl to ‘true’. Enabling SSL does not come with unacceptable performance impact. Contrary, as per results from the official ProxySQL blog, the performance drop is very low.

Native Support for Galera Cluster

Galera Cluster has been supported by ProxySQL almost since beginning but so far it was done through the external script that (typically) has been called from ProxySQL’s internal scheduler. It was up to the script to ensure that ProxySQL configuration was proper, the writer (or writers) has been correctly detected and configured in the writers hostgroup. The script was able to detect the different states Galera node may have (Primary, non-Primary, Synced, Donor/Desync, Joining, Joined) and mark the node accordingly as either available or not. The main issue is that the original script never was intended as anything other than the proof of concept written in Bash. Yet as it was distributed along with ProxySQL, it started to be improved, modified by external contributors. Others (like Percona) looked into creating their own scripts, bundled with their software. Some fixes have been introduced in the script from ProxySQL repository, some have been introduced into Percona version of the script. This led to confusion and even though all commonly used scripts handled 95% of the use cases, none of the popular ones really covered all the different situations and variables Galera cluster may end up using. Luckily, the ProxySQL 2.0 comes with native support for Galera Cluster. This makes ProxySQL support internally MySQL replication, MySQL Group Replication and now Galera Cluster. The way in how it’s done is very similar. We would like to cover the configuration of this feature as it might be not clear at the first glance.

As with MySQL replication and MySQL Group Replication, a table has been created in ProxySQL:

mysql> show create table mysql_galera_hostgroupsG
*************************** 1. row ***************************
       table: mysql_galera_hostgroups
Create Table: CREATE TABLE mysql_galera_hostgroups (
    writer_hostgroup INT CHECK (writer_hostgroup>=0) NOT NULL PRIMARY KEY,
    backup_writer_hostgroup INT CHECK (backup_writer_hostgroup>=0 AND backup_writer_hostgroup<>writer_hostgroup) NOT NULL,
    reader_hostgroup INT NOT NULL CHECK (reader_hostgroup<>writer_hostgroup AND backup_writer_hostgroup<>reader_hostgroup AND reader_hostgroup>0),
    offline_hostgroup INT NOT NULL CHECK (offline_hostgroup<>writer_hostgroup AND offline_hostgroup<>reader_hostgroup AND backup_writer_hostgroup<>offline_hostgroup AND offline_hostgroup>=0),
    active INT CHECK (active IN (0,1)) NOT NULL DEFAULT 1,
    max_writers INT NOT NULL CHECK (max_writers >= 0) DEFAULT 1,
    writer_is_also_reader INT CHECK (writer_is_also_reader IN (0,1,2)) NOT NULL DEFAULT 0,
    max_transactions_behind INT CHECK (max_transactions_behind>=0) NOT NULL DEFAULT 0,
    comment VARCHAR,
    UNIQUE (reader_hostgroup),
    UNIQUE (offline_hostgroup),
    UNIQUE (backup_writer_hostgroup))
1 row in set (0.00 sec)

There are numerous settings to configure and we will go over them one by one. First of all, there are four hostgroups:

Writer_hostgroup – it will contain all the writers (with read_only=0) up to the ‘max_writers’ setting. By default it is just only one writer
Backup_writer_hostgroup – it contains remaining writers (read_only=0) that are left after ‘max_writers’ has been added to writer_hostgroup
Reader_hostgroup – it contains readers (read_only=1), it may also contain backup writers, as per ‘writer_is_also_reader’ setting
Offline_hostgroup – it contains nodes which were deemed not usable (either offline or in a state which makes them impossible to handle traffic)

Then we have remaining settings:

Active – whether the entry in mysql_galera_hostgroups is active or not
Max_writers – how many nodes at most can be put in the writer_hostgroup
Writer_is_also_reader – if set to 0, writers (read_only=0) will not be put into reader_hostgroup. If set to 1, writers (read_only=0) will be put into reader_hostgroup. If set to 2, nodes from backup_writer_hostgroup will be put into reader_hostgroup. This one is a bit complex therefore we will present an example later in this blog post
Max_transactions_behind – based on wsrep_local_recv_queue, the max queue that’s acceptable. If queue on the node exceeds max_transactions_behind given node will be marked as SHUNNED and it will not be available for the traffic

The main surprise might be handling of the readers, which is different than how the script included in ProxySQL worked. First of all, what you have to keep in mind, is the fact, that ProxySQL uses read_only=1 to decide if node is a reader or not. This is common in replication setups, not that common in Galera. Therefore, most likely, you will want to use ‘writer_is_also_reader’ setting to configure how readers should be added to the reader_hostgroup. Let’s consider three Galera nodes, all of them have read_only=0. We also have max_writers=1 as we want to direct all the writes towards one node. We configured mysql_galera_hostgroups as follows:

SELECT * FROM mysql_galera_hostgroupsG
*************************** 1. row ***************************
       writer_hostgroup: 10
backup_writer_hostgroup: 30
       reader_hostgroup: 20
      offline_hostgroup: 40
                 active: 1
            max_writers: 1
  writer_is_also_reader: 0
max_transactions_behind: 0
                comment: NULL
1 row in set (0.00 sec)

Let’s go through all the options:

writer_is_also_reader=0

mysql> SELECT hostgroup_id, hostname FROM runtime_mysql_servers;
+--------------+------------+
| hostgroup_id | hostname   |
+--------------+------------+
| 10           | 10.0.0.103 |
| 30           | 10.0.0.101 |
| 30           | 10.0.0.102 |
+--------------+------------+
3 rows in set (0.00 sec)

This outcome is different than you would see in the scripts – there you would have remaining nodes marked as readers. Here, given that we don’t want writers to be readers and given that there is no node with read_only=1, no readers will be configured. One writer (as per max_writers), remaining nodes in backup_writer_hostgroup.

writer_is_also_reader=1

mysql> SELECT hostgroup_id, hostname FROM runtime_mysql_servers;
+--------------+------------+
| hostgroup_id | hostname   |
+--------------+------------+
| 10           | 10.0.0.103 |
| 20           | 10.0.0.101 |
| 20           | 10.0.0.102 |
| 20           | 10.0.0.103 |
| 30           | 10.0.0.101 |
| 30           | 10.0.0.102 |
+--------------+------------+
6 rows in set (0.00 sec)

Here we want our writers to act as readers therefore all of them (active and backup) will be put into the reader_hostgroup.

writer_is_also_reader=2

mysql> SELECT hostgroup_id, hostname FROM runtime_mysql_servers;
+--------------+------------+
| hostgroup_id | hostname   |
+--------------+------------+
| 10           | 10.0.0.103 |
| 20           | 10.0.0.101 |
| 20           | 10.0.0.102 |
| 30           | 10.0.0.101 |
| 30           | 10.0.0.102 |
+--------------+------------+
5 rows in set (0.00 sec)

This is a setting for those who do not want their active writer to handle reads. In this case only nodes from backup_writer_hostgroup will be used for reads. Please also keep in mind that number of readers will change if you will set max_writers to some other value. If we’d set it to 3, there would be no backup writers (all nodes would end up in the writer hostgroup) thus, again, there would be no nodes in the reader hostgroup.

Of course, you will want to configure query rules accordingly to the hostgroup configuration. We will not go through this process here, you can check how it can be done in ProxySQL blog. If you would like to test how it works in a Docker environment, we have a blog which covers how to run Galera cluster and ProxySQL 2.0 on Docker.

Other Changes

What we described above are the most notable improvements in ProxySQL 2.0. There are many others, as per the changelog. Worth mentioning are improvements around query cache (for example, addition of PROXYSQL FLUSH QUERY CACHE) and change that allows ProxySQL to rely on super_read_only to determine master and slaves in replication setup.

We hope this short overview of the changes in ProxySQL 2.0 will help you to determine which version of the ProxySQL you should use. Please keep in mind that 1.4 branch, even if it will not get any new features, it still is maintained.

Beyond Semantics: Enhancing Retrieval Augmented Generation with Hybrid Search (pgvector + Elasticsearch)

DevOpsDays Houston 2024: Our conference highlights

Simplifying database orchestration anywhere with Terraform and ClusterControl

ClusterControl now easily deployable in Kubernetes with our Helm Chart