Severalnines Blog
The automation and management blog for open source databases

Two Database Sharding Tools for MySQL

Severalnines
Posted in:

In a previous blog post we discussed several approaches to sharding. The most flexible one, sharding using metadata, is also the most complex one to implement. You need to design the meta-database, and build high availability not only for your application data but also for the metadata. On top of that, you need to design your application so it will be aware of the complex database infrastructure beneath - it has to query metadata first and then it has to be directed to a correct shard to read or write data. You will also have to build tools to manage and maintain the metadata. Migrating data requires caution so it has to be done carefully. You also have to make sure that any operations on the production databases are mirrored in the metadata. For instance, have you taken a slave out of rotation? This should be reflected in the metadata. Have you added a new slave to a shard? You have to modify the metadata and add that host. As you can imagine, a lot of time and effort has to be put into developing and maintaining scripts and tools to manage such a setup. It begs the question -  is there some external solution to design, deploy and manage a sharded environment? In this post, we will cover a couple of solutions which are available and which may help you to build a scalable, sharded infrastructure.

Vitess

Vitess is a tool built to help manage sharded environments. It was developed to help scale out databases at Youtube. In short, it is a solution based on metadata - by default, it uses range sharding but it is also possible to implement a custom sharding schema. Topology data is stored and maintained in a service like Zookeeper or etcd. Application access data using a lightweight proxy, named ‘vtgate’ in Vitess’ nomenclature. Vtgate connects to the metadata store and checks the data distribution - this allows it to route queries to correct shards - ‘tablets’.

Vitess supports range sharding - the keyspace is divided into two or more partitions, each partition covering a range of data. To find ranges, Vitess has to use a column of some kind to calculate them - currently supported data types are BIGINT UNSIGNED and VARBINARY. This works very well with id’s which usually use unsigned integer format

MySQL Fabric

In 2014, Oracle introduced a new set of tools for MySQL, called “MySQL Fabric”. Historically, there was no official tool which would allow users to build highly available topologies, including sharded setups. The idea behind Fabric is to provide an “official” tooling for building such setups. It provides a framework and tools to manage groups of highly-available MySQL instances. It supports implementation of HA setups and scaling through sharding.

MySQL Fabric uses a concept of high-availability groups - a group contains two or more MySQL servers connected using replication (actually, you can have just a single host in a group but, obviously, it won’t be highly-available).

MySQL Fabric not only gives you the ability to maintain availability of your data - it also supports scaling out through sharding. The basic idea is - if we can configure a few servers into a single high-availability group, we can then scale by having more of them. Then we’d need to implement some kind of shard mapping - we need to decide which column to use for sharding and which tables should be sharded.

If you are interested in sharding, check out our ebook on sharding.

Database Sharding with MySQL Fabric

Why do we shard? How does sharding work? What are the different ways I can shard my database? This whitepaper goes through some of the theory behind sharding. It also discusses three different tools which are designed to help users shard their MySQL databases. And last but not least, it shows you how to set up a sharded MySQL setup based on MySQL Fabric and ProxySQL.

Download Here

Related Post

Online schema change with gh-ost - throttling and changing configuration at runtime

When performing schema changes online, it is important to be able to control the performance impact on our running databases. This blog post discusses gh-ost’s throttling mechanism. We will also cover the runtime configuration changes that gh-ost allows.

Posted in:

How to perform online schema changes on MySQL using gh-ost

This blog post shows you how to use GitHub’s gh-ost to perform an online schema change on a MySQL master-slave replication setup.

Posted in:

Planets9s - 2016’s most popular s9s resources

Planets9s is a weekly communication on all the latest Severalnines resources and technologies around automation and management of open source databases such as MySQL, MariaDB, PostgreSQL & MongoDB.

Posted in:

How to Perform Efficient Backup for MySQL and MariaDB

There are a number of ways to backup MySQL and MariaDB, each comes with pros and cons. An effective backup strategy will depend on the requirements of the business, the environment you operate in, and the resources at your disposal.

Posted in: