Database Sharding with MySQL Fabric

Why do we shard? How does sharding work? What are the different ways I can shard my database? This whitepaper goes through some of the theory behind sharding. It also discusses three different tools which are designed to help users shard their MySQL databases. And last but not least, it shows you how to set up a sharded MySQL setup based on MySQL Fabric and ProxySQL.

Table of contents

  • 1. Why Sharding?
  • 2. How does Sharding work?
    • 2.1. Functional sharding
    • 2.2. Expression-based sharding
    • 2.3. Metadata-based sharding
  • 3. Sharding solutions
    • 3.1. Vitess
      • 3.1.1. Tablets
      • 3.1.2. How sharding works in Vitess?
      • 3.1.3. Migration into Vitess cluster
    • 3.2. MySQL Fabric
      • 3.2.1. High availability in MySQL Fabric
      • 3.2.2. Scaling out with MySQL Fabric
      • 3.2.3. Query routing in MySQL Fabric
  • 4. Migrating into sharded environment with MySQL Fabric
    • 4.1. Environment overview
    • 4.2. Setting up MySQL Fabric
      • 4.2.1. Installation
      • 4.2.2. Initial setup
      • 4.2.3. Setting up global replication group
      • 4.2.4. Define shard mappings
      • 4.2.5. Creating shards
    • 4.3. Setting up MySQL Router
      • 4.3.1. Installation of MySQL Router
      • 4.3.2. Configuring MySQL Router
    • 4.4. Configuring ProxySQL for sharding
      • 4.4.1. Configuring hostgroups
      • 4.4.2. Configuring query rules
      • 4.4.3. Testing of query rules
    • 4.5. Cutover process
      • 4.5.1. Preparations
      • 4.5.2. Cutover
      • 4.5.3. Cleanup
    • 4.6. Typical operations in MySQL Fabric sharded environment
      • 4.6.1. Add node to shard
      • 4.6.2. Remove node from a shard
      • 4.6.3. Promote a secondary node in a shard
      • 4.6.4. Move shard to a different high availability group
      • 4.6.5. Splitting the shard
    • 4.7. High availability aspect
    • 4.8. Summary

1. Why Sharding?

Database systems with large data sets or high throughput applications can challenge the capacity of a single database server. High query rates can exhaust CPU capacity, I/O resources, RAM or even network bandwidth.

Horizontal scaling is often the only way to scale out your infrastructure. You can upgrade to more powerful hardware, but there is a limit on how much load a single host can handle. You may be able to purchase the most expensive and the fastest CPU or storage on the market, but it still may not be enough to handle your workload. The only feasible way to scale beyond the constraints of a single host is to utilize multiple hosts working together as a part of a cluster or connected using replication.

Horizontal scaling has its limits too, though.  When it comes to scaling reads, it is very efficient - just add a node and you can utilize additional processing power. With writes, things are completely different. Consider a MySQL replication setup. Historically, MySQL replication used a single thread to process writes - in a multi-user, highly concurrent environment, this was a serious limitation. This has changed recently. In MySQL 5.6, multiple schemas could be replicated in parallel. In MySQL 5.7, after addition of a ‘logical clock’ scheduler, it became possible for a single-schema workload to benefit from the parallelization of multi-threaded replication. Galera Cluster for MySQL also allows for multi-threaded replication by utilizing multiple workers to apply writesets. Still, even with those enhancements, you can get just some incremental improvement in the write throughput - it is not the solution to the problem.

One solution would be to split our data across multiple servers using some kind of a pattern and, in that way, to split writes across multiple MySQL hosts. This is sharding.

Want to read the rest?

Download the full whitepaper for free