Multi-data center secure e-payment processing using MariaDB Galera

When their original MySQL replication-based architecture wasn’t scaling, the Paytrail ops team needed to migrate to a more robust one with no downtime. ClusterControl and the Severalnines team helped them do so in under 4 months.

Paytrail
Industry Financial services
Technologies Galera, MariaDB
Hosting On-premises
Datacenters 2
Products ClusterControl

Background

Paytrail provides payment processing services to merchants. The company provides everything that is needed for online shopping. In addition to traditional payment methods (bank e-payments, credit and debit card payments, invoicing and installments), Paytrail also offers an online shopping solution that allows consumers to use one login for all their online purchases.

Paytrail has grown enormously in the past few years, and was ranked 104 on the Deloitte Fast 500 EMEA 2013, a ranking of the 500 fastest growing technology companies in EMEA. The company grew 1464% over the past five years.

Paytrail was founded in 2007, and is based in Jyväskylä, Finland. With around 40 employees and 4000 business customers, the company is a licensed payment institution.
Tens of millions of euros pass through their systems every month.

Challenge

Paytrail’s business is processing payments for merchants. This means that, if they are down, merchants cannot accept payments online. Uptime is a core feature of Paytrail’s business, and the ops team does everything they can to make sure the systems are always up.

The databases are write-intensive, with more inserts/updates than selects. The lifecycle of a payment can generate dozens of state changes, and every single change has to be written into the database. Everything has to be tracked.

Online merchants also have their own reporting tools to access their data, including transaction history, settlements, bookkeeping reports, company information, etc. Reporting can be done on the whole history, which can be up to 7 years for some customers. These reports can generate heavy selects on the database.

With systems distributed across 2 separate data centers, Paytrail had traditionally used regular MySQL replication to achieve redundancy in a master-master setup. However, this was not a scalable or robust solution. Growth was a concern, the technology should not be a limiting factor in the company growth.

To avoid consistency problems with the master-master setups, all applications were configured to write to only one database at a time. But whenever there was a failure of the master, all connections had to be rerouted to the secondary master. This did not happen very often, but it was disruptive to the applications when it did happen. Maintenance of the database instances was also problematic. Failure was not well tolerated by the existing setup.

Our old master-master MySQL setup did not tolerate failures very well. Although it worked quite well most of the time, it did not cope well with any disruptions. We knew that we had to find a better solution.

Niko Lethonen, Services Director

Paytrail needed a new database architecture that could handle a datacenter failure without disrupting their operations. It was also important to keep the migration work to a minimum, as a rewrite of applications would not be feasible.

Solution

Paytrail had considered proprietary databases, but the cost was prohibitive so this was ruled out very quickly. There were quite a few solutions within the MySQL ecosystem, and Galera looked like a very good fit for what Paytrail was trying to achieve. It could be deployed across 2 data centers and could manage failures in a robust way.

It was possible to scale by adding more nodes and finally, it was fully compatible with InnoDB. Paytrail used the Severalnines tools to configure and deploy a MariaDB Galera Cluster, and could quickly verify characteristics like failover and performance, as well as operational requirements like scaling, upgrades, backups and point in time recovery using ClusterControl.

Outcome

No schema changes were required, all database tables were already using InnoDB so it was a matter of taking a mysqldump of their existing data and restoring it on a Galera node. Applications required no changes.

By using HAProxy in front of the Galera nodes, all applications could connect to one virtual IP address and get routed to one available Galera node. The ops team had been anxious about the actual migration of the live database to Galera and wanted to reduce all risks.

They established the Galera Cluster as a slave to the existing database setup, made sure the data was up to date, and then redirected their applications to the new cluster. The whole process, from initial evaluation to going live, took under four months.

To go from initial evaluation to full production within 4 months would not have been possible without Severalnines.

The ops team used ClusterControl to continuously monitor performance of the cluster, find any performance bottlenecks in Galera or identify slow performing queries. A mixture of full backups and frequent incremental backups is scheduled via ClusterControl. Binlogs are saved so as to be able to do a point in time recovery in case of disaster.

The underlying MariaDB and Galera software is regularly updated using the ClusterControl upgrade functionality. Staying up to date with bug fixes and keeping up with the most stable releases is important to achieve a high security level.


Summary

confirmation-grey

Facing scalability issues with MySQL

Paytrail’s original multi-primary MySQL setup was demonstrating consistency and scalability issues, e.g. when one primary failed, all traffic had to rerouted to the other, causing application-wide disruptions; something had to change.

confirmation-grey

Swapping in MariaDB Galera…with help

MariaDB Galera was a perfect substitution as it could handle failures in a multi-data center environment gracefully and didn’t require app rewrites. ClusterControl and the Severalnines team helped them achieve it in under 4 months.

confirmation-grey

Handling all ops with ClusterControl

The ops team now uses ClusterControl to handle keys operations, including performance monitoring (at the server and database level, e.g. query monitoring), scheduling full and partial backups, and performing database upgrades.

Ready to automate your database?

Sign up now and you’ll be running your database in just minutes.

Subscribe below to be notified of fresh posts