Handling Large Transactions with Streaming Replication and MariaDB 10.4

Krzysztof Ksiazek

Dealing with large transactions was always a pain point in Galera Cluster. The way in which Galera writeset certification works causes troubles when transactions are long or when a single row is being modified often on multiple nodes. As a result, transactions have to be rolled back and retried causing performance drops. Luckily, this problem has been addressed in Galera 4, a new release of Galera from Codership. This library is used in MariaDB 10.4, so installing MariaDB 10.4 the easiest way of testing the newly introduced features. In this blog post we will take a look at how the streaming replication can be used to mitigate problems which used to be a standard issue in previous Galera versions.

We will use three nodes of MariaDB Galera cluster version 10.4.6, which comes with Galera version of 26.4.2.

MariaDB [(none)]> show global status like 'wsrep_provider%';
+-----------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
| Variable_name               | Value                                                                                                                                          |
+-----------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
| wsrep_provider_capabilities | :MULTI_MASTER:CERTIFICATION:PARALLEL_APPLYING:TRX_REPLAY:ISOLATION:PAUSE:CAUSAL_READS:INCREMENTAL_WRITESET:UNORDERED:PREORDERED:STREAMING:NBO: |
| wsrep_provider_name         | Galera                                                                                                                                         |
| wsrep_provider_vendor       | Codership Oy <[email protected]>                                                                                                              |
| wsrep_provider_version      | 26.4.2(r4498)                                                                                                                                  |
+-----------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
4 rows in set (0.001 sec)

There are three main pain points streaming replication is intended to deal with:

  • Long transactions
  • Large transactions
  • Hot spots in tables

Let’s consider them one by one and see how streaming replication may help us to deal with them but first let’s focus on the writeset certification - the root cause of those issues to show up.

Writeset Certification in Galera Cluster

Galera cluster consists of multiple writeable nodes. Each transaction executed on Galera cluster forms a writeset. Every writeset has to be sent to all of the nodes in the cluster for certification - a process which ensures that all the nodes can apply given transaction. Writesets have to be executed on all of the cluster nodes so if there is any conflict, transaction cannot be committed. What are typical reasons why the transaction cannot be committed? Well, the three points we listed earlier:

  • Long transactions - longer the transaction takes, more likely is that in the meantime another node will execute updates which will eventually conflict with the writeset and prevent it from passing the certification
  • Large transactions - first of all, large transactions are also longer than small ones, so that triggers the first problem. Second problem, strictly related to the large transactions, is the volume of the changes. More rows are going to be updated, more likely is that some write on another node will result in a conflict and the whole transaction would have to be rolled back.
  • Hot spots in tables - more likely given row is to be updated, more probably such update will happen simultaneously on multiple nodes resulting in some of the transactions to be rolled back

The main issue here is that Galera does not introduce any locking on nodes other than the initial node, on which the transaction was opened. The certification process is based on a hope that if one node could execute a transaction, others should be able to do it too. It is true but, as we discussed, there are corner cases in which the probability of this happening is significantly reduced.

In Galera 4, with streaming replication, the behavior has changed and all locks are being taken in all of the nodes. Transactions will be split into parts and each part will be certified on all nodes. After successful certification rows will be locked on all nodes in the cluster. There are a couple of variables that govern how exactly this is done - wsrep_trx_fragment_size and wsrep_trx_fragment_unit define how large the fragment should be and how it should be defined. It is very fine-grained control: you can define fragment unit as bytes, statements or rows which makes it possible to run the certification for every row modified in the transaction. Let’s take a look at how you can benefit from the streaming replication in real life.

Working with the Streaming Replication

Let’s consider the following scenario. We have a transaction to run that takes at least 30 seconds:

BEGIN; UPDATE sbtest.sbtest1 SET k = k - 2 WHERE id < 2000 ; UPDATE sbtest.sbtest1 SET k = k + 1 WHERE id < 2000 ; UPDATE sbtest.sbtest1 SET k = k + 1 WHERE id < 2000 ; SELECT SLEEP(30); COMMIT;

Then, while it is running, we will execute SQL that touches similar rows. This will be executed on another node:

BEGIN; UPDATE sbtest.sbtest1 SET k = k - 1 WHERE id < 20 ; UPDATE sbtest.sbtest1 SET k = k + 1 WHERE id < 20 ; COMMIT;

What would be the result?

The first transaction is rolled back as soon as the second one is executed:

MariaDB [sbtest]> BEGIN; UPDATE sbtest.sbtest1 SET k = k - 2 WHERE id < 2000 ; UPDATE sbtest.sbtest1 SET k = k + 1 WHERE id < 2000 ; UPDATE sbtest.sbtest1 SET k = k + 1 WHERE id < 2000 ; SELECT SLEEP(30); COMMIT;
Query OK, 0 rows affected (0.001 sec)

Query OK, 667 rows affected (0.020 sec)
Rows matched: 667  Changed: 667  Warnings: 0

Query OK, 667 rows affected (0.010 sec)
Rows matched: 667  Changed: 667  Warnings: 0

Query OK, 667 rows affected (0.009 sec)
Rows matched: 667  Changed: 667  Warnings: 0

ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction
Query OK, 0 rows affected (0.001 sec)

The transaction on the second node succeeded:

MariaDB [(none)]> BEGIN; UPDATE sbtest.sbtest1 SET k = k - 1 WHERE id < 20 ; UPDATE sbtest.sbtest1 SET k = k + 1 WHERE id < 20 ; COMMIT;
Query OK, 0 rows affected (0.000 sec)

Query OK, 7 rows affected (0.002 sec)
Rows matched: 7  Changed: 7  Warnings: 0

Query OK, 7 rows affected (0.001 sec)
Rows matched: 7  Changed: 7  Warnings: 0

Query OK, 0 rows affected (0.004 sec)

What we can do to avoid it is to use streaming replication for the first transaction. We will ask Galera to certify every row change:

MariaDB [sbtest]> BEGIN; SET SESSION wsrep_trx_fragment_size=1 ; SET SESSION wsrep_trx_fragment_unit='rows' ; UPDATE sbtest.sbtest1 SET k = k - 2 WHERE id < 2000 ; UPDATE sbtest.sbtest1 SET k = k + 1 WHERE id < 2000 ; UPDATE sbtest.sbtest1 SET k = k + 1 WHERE id < 2000 ; SELECT SLEEP(30); COMMIT; SET SESSION wsrep_trx_fragment_size=0;
Query OK, 0 rows affected (0.001 sec)

Query OK, 0 rows affected (0.000 sec)

Query OK, 0 rows affected (0.000 sec)

Query OK, 667 rows affected (1.757 sec)
Rows matched: 667  Changed: 667  Warnings: 0

Query OK, 667 rows affected (1.708 sec)
Rows matched: 667  Changed: 667  Warnings: 0

Query OK, 667 rows affected (1.685 sec)
Rows matched: 667  Changed: 667  Warnings: 0

As you can see, this time it worked just fine. On the second node:

MariaDB [(none)]> BEGIN; UPDATE sbtest.sbtest1 SET k = k - 1 WHERE id < 20 ; UPDATE sbtest.sbtest1 SET k = k + 1 WHERE id < 20 ; COMMIT;
Query OK, 0 rows affected (0.000 sec)

Query OK, 7 rows affected (33.942 sec)
Rows matched: 7  Changed: 7  Warnings: 0

Query OK, 7 rows affected (0.001 sec)
Rows matched: 7  Changed: 7  Warnings: 0

Query OK, 0 rows affected (0.026 sec)

What is interesting, you can see that the UPDATE took almost 34 seconds to execute - this was caused by the fact that the initial transaction, through the streaming replication, locked all modified rows on all of the nodes and our second transaction had to wait for the first one to complete even though both transactions were executed on different nodes.

This is basically it when it comes to the streaming replication. Depending on the requirements and the traffic you may use it in a less strict manner - we certified every row but you can change that to every n-th row or every statement. You can even decide on the volume of data to certify. This should be enough to match the requirements of your environment.

There are couple more things we would like you to keep in mind and remember. First of all, streaming replication is by no means a solution that should be used by default. This is the reason why it is, by default, disabled. The recommended use case is to manually decide on transactions that would benefit from the streaming replication and enable it at the session level. This is the reason why our examples end with:

SET SESSION wsrep_trx_fragment_size=0;

This statement (setting wsrep_trx_fragment_size to 0) disables streaming replication for current session.

Another thing worth remembering - if you happen to use streaming replication, it will use ‘wsrep_streaming_log’ table in ‘mysql’ schema for persistently storing the data that is streaming. Using this table you can get some idea about the data that is being transferred across the cluster using streaming replication.

Finally, the performance. This is also one of the reasons why you do not want to use streaming replication all the time. Main reason for that is locking - with streaming replication you have to acquire row locks on all of the nodes. This takes time to get the locks and, should you have to rollback the transaction, it will also put the pressure on all nodes to perform the rollback. We ran a very quick test of the performance impact that the streaming replication have. The environment is strictly a test one so do not assume those results to be the same on the production-grade hardware, it is more for you to see what the impact could be.

We tested four scenarios:

  1. Baseline, set global wsrep_trx_fragment_size=0;
  2. set global wsrep_trx_fragment_unit='rows'; set global wsrep_trx_fragment_size=1;
  3. set global wsrep_trx_fragment_unit='statements'; set global wsrep_trx_fragment_size=1;
  4. set global wsrep_trx_fragment_unit='statements'; set global wsrep_trx_fragment_size=5;

We used sysbench r/w test:

sysbench /root/sysbench/src/lua/oltp_read_write.lua --threads=4 --events=0 --time=300 --mysql-host=10.0.0.141 --mysql-user=sbtest --mysql-password=sbtest --mysql-port=3306 --tables=32 --report-interval=1 --skip-trx=off --table-size=100000 --db-ps-mode=disable run

The results are:

  1. Transactions: 82.91 per sec., queries: 1658.27 per sec. (100%)
  2. Transactions: 54.72 per sec., queries: 1094.43 per sec. (66%)
  3. Transactions: 54.76 per sec., queries: 1095.18 per sec. (66%)
  4. Transactions: 70.93 per sec., queries: 1418.55 per sec. (86%)

As you can see, the impact is significant, performance drops even by 33%.

We hope you found this blog post informative and it gave you some insights into the streaming replication that comes with Galera 4 and MariaDB 10.4. We tried to cover use cases and potential drawbacks related to this new technology.

ClusterControl
The only management system you’ll ever need to take control of your open source database infrastructure.