blog

Webinar Replay: How to Build Scalable Database Infrastructures with MariaDB & HAProxy

Jean-Jérôme Schmidt

Published:

Thanks to everyone who participated in last week’s live webinar on how CloudStats.me moved from MySQL to clustered MariaDB for high availability with Severalnines ClusterControl. The webinar included use case discussions on cloudstats.me’s database infrastructure bundled with a live demonstration of ClusterControl to illustrate the key elements that were discussed.

We had a lot of questions in the audience and you can read through the transcript of these further below in this blog.

If you missed the session and/or would like to watch the replay in your own time, it is now available online for sign up and viewing.

Replay Details

Get access to the replay

Agenda

  • CloudStats.me infrastructure overview
  • Database challenges
  • Limitations in cloud-based infrastructure
  • Scaling MySQL – many options
    • MySQL Cluster, Master-Slave Replication, Sharding, …
  • Availability and failover
  • Application sharding vs auto-sharding
  • Migration to MariaDB / Galera Cluster with ClusterControl & NoSQL
  • Load Balancing with HAProxy & MaxScale
  • Infrastructure set up provided to CloudStats.me
    • Private Network, Cluster Nodes, H/W SSD Raid + BBU
  • What we learnt – “Know your data!”

Speakers

Andrey Vasilyev, CTO of Aqua Networks Limited – a London-based company which owns brands, such as WooServers.com, CloudStats.me and CloudLayar.com, and Art van Scheppingen, Senior Support Engineer at Severalnines, discussed the challenges encountered by CloudStats.me in achieving database high availability and performance, as well as the solutions that were implemented to overcome these challenges.

If you have any questions or would like a personalised live demo, please do contact us.

Follow our technical blogs: https://severalnines.com/blog


Questions & Answers – Transcript

Maybe my question is not directly related to the topic of the webinar… But will your company (I mean Severalnines) in the future also consider the possibility to install and setup Pivotal’s Greenplum database?
Currently, there are no plans for us that, as we have not received requests to support Greenplum yet. But it’s something we’ll keep in mind!

What about Spider and ClusterControl? Is this combination available / being used?
Spider can be used independently of ClusterControl, since ClusterControl can be used to manage the individual MySQL instances. We are not aware of any ClusterControl users who are using Spider.

Is MySQL Cluster NDB much faster than a Galera Cluster?
MySQL NDB and Galera Cluster are two different types of clustering. The main difference is that in Galera Cluster all nodes are equal and contain the same dataset, while with NDB Cluster the data nodes contain sharded/mirrored data sets. NDB Cluster can handle larger data sets to write, but if you need multiple equal MySQL master nodes Galera is a better choice. Galera is also faster in replicating data than a traditional MySQL replication due to the ability to write all queries in parallel.

Does CloudStats also support database backups on the end user level?
CloudStats can backup your files to S3, Azure, locally etc., but for database backup, it’s best to use ClusterControl, while CloudStats is for the rest of files.

Is it possible to restore the structure and the whole setup of a previous ClusterControl infrastructure from the backups?
Yes, that would be possible, if you make backups of your existing ClusterControl database and configuration files.

I’m using Maxscale with Galera. The Read/Write Split modules drop the connection on very intensive operations involving reading and then writing of rows up to 80,000, but works fine with readconnroute module (which doesn’t split). Anyway I can scale writes involving just Galera?
You could create two readconnroute interfaces using MaxScale and use one for writes only. You can do this by adding router_options=master to the configuration and with a Galera cluster this will only write to one single node in the Galera cluster.

Cluster is fast as the slowest node? like NODE1-SSD, NODE2-SSD, NODE3-SATA…
Yes, within Galera Cluster your slowest node will determine the speed of the whole cluster

Galera cluster is INNODB only. If it is. Is it recommended not to use MyISAM?
In principle Galera is InnoDB only, however there is limited support for MyISAM if you encapsulate your queries in a transaction. As there is no guarantee the data will be kept equal on all nodes due to MyISAM not being a transactional storage engine and this could cause data drift to happen. Using MyISAM with Galera is not advised to do.

Virtualized nodes should then be on SSD host storage. Not network storage because IOPS will be low. Correct?
Yes, that’s correct, its best to store it on a local ssd.

MySQLdump is slow right?
MySQLdump is dumping the entire contents of your database as a logical backup and therefore slower than Xtrabackup.

HAProxy instances are installed on 2x cluster control servers?
HAProxy instances are usually installed on dedicated hosts, not on the CC node.

What about MySQL proxy, and use cases with that tool? And would it be better to just split query R/W at application level?
MySQL proxy can be used, but the tool is not maintained anymore and we found HAProxy and MaxScale are better options.

HAProxy can run custom scripts and these statuses can also be manually created right?
Exactly, you can do that. ClusterControl just has a few preset checks by default but you can change them if you like

In your experience, how does EC2 perform with a MariaDB based cluster?
According to our Benchmarks, EC2 M3.Xlarge instances showed a Read Output performance of 16914 and Write Input Performance of 31092, which is 2 times higher than a similar sized Microsoft Azure DS3 instance (16329 iops for Reads and 15900 iops for Writes). So yes, according to our test AWS might perform better than Azure for Write performance, but it will depend on your application size and requirements. A local SSD storage on a server might be recommended for higher iops performance.

Blue – Reads
Red  – Writes

Does WooServers offer PCI DSS compliant servers?
Yes, WooServers offer PCI DSS servers and are able to manage your current infrastructure that you have, either on Azure, on premises or AWS.

AAA pluggable / scriptable? A customer came up with Radius recently…
Unfortunately the authentication/authorization is limited to either ClusterControl internal AAA or LDAP only.

Also: GUI functions accessible via JSON/HTTP API ?
Yes, our most important GUI functions are available through our RPC api. So you would be able to automate deploying, backups and scaling easily.

Subscribe below to be notified of fresh posts