Deploying Galera Clusters across WAN environments might lead to concerns around data privacy and security - especially as more organisations are having to comply with national and international regulations. You would not want hackers eavesdropping or intercepting replication traffic. Encrypted replication hides what is sent between the Galera nodes, and makes sure each node is only communicating to the ones it trusts. But how expensive is encryption?
In this blog, we will show you how to encrypt the replication traffic between your Galera nodes. We will also look into the performance impact of this encryption.
Galera supports SSL for the encryption of replication traffic. When encryption is enabled, Galera group communication and Incremental State Transfer (IST) happens over an SSL-encrypted connection. The default SST methods do not support SSL, however this can be scripted.
All nodes in a Galera Cluster communicate via a group communication protocol (default port is 4567), whereby writesets are replicated from originating nodes to the rest of the nodes for certification and possibly commit. This communication protocol can be encrypted with an SSL certificate and a private key.
Note that this does not encrypt the communication between the MySQL client (the application) and the MySQL instances. Finally, at the time of writing, some extra steps are required when adding nodes automatically via ClusterControl.
Enabling Encrypted Replication
Enabling encryption is now supported by the s9s tools available from our Github repository. This feature will be included in the next release of ClusterControl, scheduled by the end of August. The implementation flow is based on Codership’s Galera Cluster documentation.
The following requirements must be met if you would like to run encrypted Galera replication:
- All nodes in the cluster must run with either SSL enabled or disabled. You cannot have some of them run on SSL while others do not.
- All Galera nodes must use the same certificate and key file.
- Galera Cluster need to re-bootstrap to load or unload the SSL parameters.
To have this functionality with the current version of ClusterControl (1.2.6 or earlier), clone the s9s-admin Github repository on the ClusterControl host:
$ git clone https://github.com/severalnines/s9s-admin
Or, if you already have the repository cloned, just pull the latest changes:
$ cd s9s-admin $ git pull
Update the s9s_galera located in /usr/bin with the latest one from Github:
$ cp s9s-admin/cluster/s9s_galera /usr/bin/
To enable Galera encrypted replication with ClusterControl, run the following command:
$ s9s_galera --encrypt-replication -i 1 -o enable
This action will generate a 2048-bit key and certificate on the ClusterControl node and transfer them to all the Galera nodes. Then, it will stop the Galera cluster (cluster ID 1) and configure wsrep_provider_options in the MySQL config files. Once done, it will start the cluster by re-bootstrapping it.
Due to the HeartBleed bug, the tool will automatically update the OpenSSL package to the latest version using package manager before generating keys and certificates. If your ClusterControl host does not have an internet connection, you may skip this step by appending -s in the command line. Also, the default 2048-bit key can be overridden with -b option.
To verify whether encryption has been enabled, use the status parameter:
$ s9s_galera --encrypt-replication -i1 -o status load opts 1 Cluster Address: 10.0.0.21:4567,10.0.0.22:4567,10.0.0.23:4567 Galera port: 4567 Cluster name: my_wsrep_cluster Garbd (arbitrators): OS class: redhat 10.0.0.21 key: /etc/ssl/galera/cluster_1/galera_rep.key, cert: /etc/ssl/galera/cluster_1/galera_rep.crt, status: enabled 10.0.0.22 key: /etc/ssl/galera/cluster_1/galera_rep.key, cert: /etc/ssl/galera/cluster_1/galera_rep.crt, status: enabled 10.0.0.23 key: /etc/ssl/galera/cluster_1/galera_rep.key, cert: /etc/ssl/galera/cluster_1/galera_rep.crt, status: enabled
You can also verify this using the ClusterControl UI, under Performance > DB Variables > wsrep_provider_options, similar to the screenshot below:
From the Galera point-of-view, we can see from the MySQL error log file that SSL handshakes were successful:
[Note] WSREP: initializing ssl context [Note] WSREP: backend: asio [Note] WSREP: GMCast version 0 [Note] WSREP: (221ec50e-0dd0-11e4-b86a-f69141249b88, 'ssl://0.0.0.0:4567') listening at ssl://0.0.0.0:4567 [Note] WSREP: (221ec50e-0dd0-11e4-b86a-f69141249b88, 'ssl://0.0.0.0:4567') multicast: , ttl: 1 [Note] WSREP: EVS version 0 [Note] WSREP: PC version 0 [Note] WSREP: gcomm: connecting to group 'my_wsrep_cluster', peer '10.0.0.63:4567' [Note] WSREP: SSL handshake successful, remote endpoint <a href="//10.0.0.63:4567" title="//10.0.0.63:4567">ssl://10.0.0.63:4567</a> local endpoint <a href="//10.0.0.62:60968" title="//10.0.0.62:60968">ssl://10.0.0.62:60968</a> cipher: AES128-SHA compression: [Note] WSREP: (221ec50e-0dd0-11e4-b86a-f69141249b88, 'ssl://0.0.0.0:4567') turning message relay requesting on, nonlive peers: ssl://10.0.0.61:4567 [Note] WSREP: SSL handshake successful, remote endpoint <a href="//10.0.0.61:36848" title="//10.0.0.61:36848">ssl://10.0.0.61:36848</a> local endpoint <a href="//10.0.0.62:4567" title="//10.0.0.62:4567">ssl://10.0.0.62:4567</a> cipher: AES128-SHA compression: [Note] WSREP: (221ec50e-0dd0-11e4-b86a-f69141249b88, 'ssl://0.0.0.0:4567') turning message relay requesting off [Note] WSREP: declaring 1d52d611-0dd0-11e4-8f89-6e3c188d78f4 stable [Note] WSREP: declaring 206e8018-0dd0-11e4-a117-6361fc40c2d1 stable ...
Disabling encryption is as easy as enabling it:
$ s9s_galera --encrypt-replication -i 1 -o disable
This will remove any configuration options in the MySQL configuration file under wsrep_provider_options related to SSL parameters. Take note that adding a new node into an encrypted Galera cluster is yet to be included in the next ClusterControl release. As a workaround, you can disable the encryption prior to adding node and enable it back through command line.
Let’s look at the performance impact of encrypting the replication traffic. We did a simple benchmark using Sysbench 0.5 on a three-node MySQL Galera Cluster 5.6 (Codership) running on AWS EC2 m3.xlarge (4 CPUs, 15 GB RAM, 50 GB SSD - 300 IOPS). We deployed using the Galera Configurator and using the default 2048-bit private key. Sysbench distributes connections between specified MySQL hosts on a round-robin basis.
These tests used the same data set, we prepared 20 million records (~5.5 GB of data) with the following command:
$ sysbench \ --db-driver=mysql \ --mysql-table-engine=innodb \ --test=/usr/share/sysbench/tests/db/oltp.lua \ --oltp-table-size=20000000 \ --mysql-host=10.0.0.239 \ --mysql-port=3306 \ --mysql-user=sbtest \ --mysql-password=password \ prepare
The cluster was re-bootstrapped after each test.
Test 1: Write Performance
We performed a sysbench OLTP complex test and disabled all reads to test the write performance with 5 million requests and 32 threads using the following command:
$ sysbench \ --db-driver=mysql \ --num-threads=32 \ --max-requests=5000000 \ --oltp-table-size=20000000 \ --oltp-test-mode=complex \ --test=/usr/share/sysbench/tests/db/oltp.lua \ --oltp-range-size=0 \ --oltp-point-selects=0 \ --oltp-simple-ranges=0 \ --oltp-sum-ranges=0 \ --oltp-order-ranges=0 \ --oltp-distinct-ranges=0 \ --mysql-host=10.0.0.239,10.0.0.240,10.0.0.241 \ --mysql-port=3306 \ --mysql-user=sbtest \ --mysql-password=password \ run
The above command performed the following number of db operations:
- read: 0
- write: 20013067
- other: 10003433
- total: 30016500
Plain replication took 4042s to complete the test while encrypted replication took 5106s, indicating a 26% performance drop. The following graph captured from ClusterControl shows the number of write operations performed on the cluster (the higher the better):
Pay attention to the CPU and network usage between plain and encrypted replication:
Test 2: Read Performance
Next, we performed a sysbench OLTP read-only test mode to test the read performance with 1 million requests and 32 threads using the following command:
$ sysbench \ --db-driver=mysql \ --num-threads=32 \ --max-requests=1000000 \ --oltp-table-size=2000000 \ --oltp-read-only=on \ --oltp-dist-type=uniform \ --test=/usr/share/sysbench/tests/db/oltp.lua \ --mysql-host=10.0.0.138,10.0.0.139,10.0.0.140 \ --mysql-port=3306 \ --mysql-user=sbtest \ --mysql-password=password \ run
The above command performed the following number of db operations:
- read: 14000518
- write: 0
- other: 2000074
- total: 16000592
The buffer pool was pre-warmed prior to this test to get predictable result. Encryption does not really affect the read performance since there was no writeset replication involved during the benchmarking activity. This can be confirmed with the following graphs captured by ClusterControl:
Despite the early spike, the rest of the read operations are in the same range, between 3000 to 4000 reads per seconds. In fact, the total execution time for plain replication was 4277s (233 tps) while encrypted replication was 4333s (231 tps). Not much difference for an hour of read operations. This is also expected as only writes are distributed between the nodes and subject to encryption.
Test 3: Mixed Read/Write Performance
This test used OLTP complex test as we tried to produce a mixed workload consisting of 65% reads and 35% writes with 1 million requests and 32 threads using following command:
$ sysbench \ --db-driver=mysql \ --num-threads=32 \ --max-requests=1000000 \ --oltp-table-size=2000000 \ --oltp-dist-type=uniform \ --test=/usr/share/sysbench/tests/db/oltp.lua \ --mysql-host=10.0.0.138,10.0.0.139,10.0.0.140 \ --mysql-port=3306 \ --mysql-user=sbtest \ --mysql-password=password \ --mysql-ignore-duplicates \ run
The above command performed the following number of database operations:
- read: 14017570
- write: 4004153
- other: 2001274
- total: 20022997
With plain replication, the test took 2549s (392 tps) to complete while with encrypted replication, it took 2824s (354 tps) indicating 11% performance drop from the former. The following graph captured from ClusterControl shows the number of queries performed on the cluster (the higher the better):
As expected for the CPU and network usage, we can see the difference in the following graphs:
The performance impact of encrypting Galera replication traffic seems to be reasonable. However, this might not necessarily apply to your workload. It depends on your network performance, private key (a key can have 1024, 2048, 4096 bits, higher is slower), configuration parameters as well as your hardware. We would recommend you to set up your own Galera Cluster and give it a try - since SSL can now be turned on/off in one single command using the s9s tools.