Most databases grow in size over time. The growth is not always fast enough to impact the performance of the database, but there are definitely cases where that happens. When it does, we often wonder what could be done to reduce that impact and how can we ensure smooth database operations when dealing with data on a large scale.
First of all, let’s try to define what does a “large data volume” mean? For MySQL or MariaDB it is uncompressed InnoDB. InnoDB works in a way that it strongly benefits from available memory – mainly the InnoDB buffer pool. As long as the data fits there, disk access is minimized to handling writes only – reads are served out of the memory. What happens when the data outgrows memory? More and more data has to be read from disk when there’s a need to access rows, which are not currently cached. When the amount of data increase, the workload switches from CPU-bound towards I/O-bound. It means that the bottleneck is no longer CPU (which was the case when the data fit in memory – data access in memory is fast, data transformation and aggregation is slower) but rather it’s the I/O subsystem (CPU operations on data are way faster than accessing data from disk.) With increased adoption of flash, I/O bound workloads are not that terrible as they used to be in the times of spinning drives (random access is way faster with SSD) but the performance hit is still there.
Another thing we have to keep in mind that we typically only care about the active dataset. Sure, you may have terabytes of data in your schema but if you have to access only last 5GB, this is actually quite a good situation. Sure, it still pose operational challenges, but performance-wise it should still be ok.
Let’s just assume for the purpose of this blog, and this is not a scientific definition, that by the large data volume we mean case where active data size significantly outgrows the size of the memory. It can be 100GB when you have 2GB of memory, it can be 20TB when you have 200GB of memory. The tipping point is that your workload is strictly I/O bound. Bear with us while we discuss some of the options that are available for MySQL and MariaDB.
The historical (but perfectly valid) approach to handling large volumes of data is to implement partitioning. The idea behind it is to split table into partitions, sort of a sub-tables. The split happens according to the rules defined by the user. Let’s take a look at some of the examples (the SQL examples are taken from MySQL 8.0 documentation)
MySQL 8.0 comes with following types of partitioning:
It can also create subpartitions. We are not going to rewrite documentation here but we would still like to give you some insight into how partitions work. To create partitions, you have to define the partitioning key. It can be a column or in case of RANGE or LIST multiple columns that will be used to define how the data should be split into partitions.
HASH partitioning requires user to define a column, which will be hashed. Then, the data will be split into user-defined number of partitions based on that hash value:
CREATE TABLE employees ( id INT NOT NULL, fname VARCHAR(30), lname VARCHAR(30), hired DATE NOT NULL DEFAULT '1970-01-01', separated DATE NOT NULL DEFAULT '9999-12-31', job_code INT, store_id INT ) PARTITION BY HASH( YEAR(hired) ) PARTITIONS 4;
In this case hash will be created based on the outcome generated by YEAR() function on ‘hired’ column.
KEY partitioning is similar with the exception that user define which column should be hashed and the rest is up to the MySQL to handle.
While HASH and KEY partitions randomly distributed data across the number of partitions, RANGE and LIST let user decide what to do. RANGE is commonly used with time or date:
CREATE TABLE quarterly_report_status ( report_id INT NOT NULL, report_status VARCHAR(20) NOT NULL, report_updated TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP ) PARTITION BY RANGE ( UNIX_TIMESTAMP(report_updated) ) ( PARTITION p0 VALUES LESS THAN ( UNIX_TIMESTAMP('2008-01-01 00:00:00') ), PARTITION p1 VALUES LESS THAN ( UNIX_TIMESTAMP('2008-04-01 00:00:00') ), PARTITION p2 VALUES LESS THAN ( UNIX_TIMESTAMP('2008-07-01 00:00:00') ), PARTITION p3 VALUES LESS THAN ( UNIX_TIMESTAMP('2008-10-01 00:00:00') ), PARTITION p4 VALUES LESS THAN ( UNIX_TIMESTAMP('2009-01-01 00:00:00') ), PARTITION p5 VALUES LESS THAN ( UNIX_TIMESTAMP('2009-04-01 00:00:00') ), PARTITION p6 VALUES LESS THAN ( UNIX_TIMESTAMP('2009-07-01 00:00:00') ), PARTITION p7 VALUES LESS THAN ( UNIX_TIMESTAMP('2009-10-01 00:00:00') ), PARTITION p8 VALUES LESS THAN ( UNIX_TIMESTAMP('2010-01-01 00:00:00') ), PARTITION p9 VALUES LESS THAN (MAXVALUE) );
It can also be used with other type of columns:
CREATE TABLE employees ( id INT NOT NULL, fname VARCHAR(30), lname VARCHAR(30), hired DATE NOT NULL DEFAULT '1970-01-01', separated DATE NOT NULL DEFAULT '9999-12-31', job_code INT NOT NULL, store_id INT NOT NULL ) PARTITION BY RANGE (store_id) ( PARTITION p0 VALUES LESS THAN (6), PARTITION p1 VALUES LESS THAN (11), PARTITION p2 VALUES LESS THAN (16), PARTITION p3 VALUES LESS THAN MAXVALUE );
The LIST partitions work based on a list of values that sorts the rows across multiple partitions:
CREATE TABLE employees ( id INT NOT NULL, fname VARCHAR(30), lname VARCHAR(30), hired DATE NOT NULL DEFAULT '1970-01-01', separated DATE NOT NULL DEFAULT '9999-12-31', job_code INT, store_id INT ) PARTITION BY LIST(store_id) ( PARTITION pNorth VALUES IN (3,5,6,9,17), PARTITION pEast VALUES IN (1,2,10,11,19,20), PARTITION pWest VALUES IN (4,12,13,14,18), PARTITION pCentral VALUES IN (7,8,15,16) );
What is the point in using partitions you may ask? The main point is that the lookups are significantly faster than with non-partitioned table. Let’s say that you want to search for the rows which were created in a given month. If you have several years worth of data stored in the table, this will be a challenge – an index will have to be used and, as we know, indexes help to find rows but accessing those rows will result in a bunch of random reads from the whole table. If you have partitions created on year-month basis, MySQL can just read all the rows from that particular partition – no need for accessing index, no need for doing random reads: just read all the data from the partition, sequentially, and we are all set.
Partitions are also very useful in dealing with data rotation. If MySQL can easily identify rows to delete and map them to single partition, instead of running DELETE FROM table WHERE …, which will use index to locate rows, you can truncate the partition. This is extremely useful with RANGE partitioning – sticking to the example above, if we want to keep data for 2 years only, we can easily create a cron job, which will remove old partition and create a new, empty one for next month.
If we have a large volume of data (not necessarily thinking about databases), the first thing that comes to our mind is to compress it. There are numerous tools that provide an option to compress your files, significantly reducing their size. InnoDB also has an option for that – both MySQL and MariaDB supports InnoDB compression. The main advantage of using compression is the reduction of the I/O activity. Data, when compressed, is smaller thus it is faster to read and to write. Typical InnoDB page is 16KB in size, for SSD this is 4 I/O operations to read or write (SSD typically use 4KB pages). If we manage to compress 16KB into 4KB, we just reduced I/O operations by four. It does not really help much regarding dataset to memory ratio. Actually, it may even make it worse – MySQL, in order to operate on the data, has to decompress the page. Yet it reads compressed page from disk. This results in InnoDB buffer pool storing 4KB of compressed data and 16KB of uncompressed data. Of course, there are algorithms in place to remove unneeded data (uncompressed page will be removed when possible, keeping only compressed one in memory) but you cannot expect too much of an improvement in this area.
It is also important to keep in mind how compression works regarding the storage. Solid state drives are norm for database servers these days and they have a couple of specific characteristics. They are fast, they don’t care much whether traffic is sequential or random (even though they still prefer sequential access over the random). They are expensive for large volumes. They suffer from “worn out” as they can handle a limited number of write cycles. Compression significantly helps here – by reducing the size of the data on disk, we reduce the cost of the storage layer for database. By reducing the size of the data we write to disk, we increase the lifespan of the SSD.
Unfortunately, even if compression helps, for larger volumes of data it still may not be enough. Another step would be to look for something else than InnoDB.
MyRocks is a storage engine available for MySQL and MariaDB that is based on a different concept than InnoDB. My colleague, Sebastian Insausti, has a nice blog about using MyRocks with MariaDB. The gist is, due to its design (it uses Log Structured Merge, LSM), MyRocks is significantly better in terms of compression than InnoDB (which is based on B+Tree structure). MyRocks is designed for handling large amounts of data and to reduce the number of writes. It originated from Facebook, where data volumes are large and requirements to access the data are high. Thus SSD storage – still, on such a large scale every gain in compression is huge. MyRocks can deliver even up to 2x better compression than InnoDB (which means you cut the number of servers by two). It is also designed to reduce the write amplification (number of writes required to handle a change of the row contents) – it requires 10x less writes than InnoDB. This, obviously, reduces I/O load but, even more importantly, it will increase lifespan of a SSD ten times compared with handing the same load using InnoDB). From a performance standpoint, smaller the data volume, the faster the access thus storage engines like that can also help to get the data out of the database faster (even though it was not the highest priority when designing MyRocks).
At some point all we can do is to admit that we cannot handle such volume of data using MySQL. Sure, you can shard it, you can do different things but eventually it just doesn’t make sense anymore. It is time to look for additional solutions. One of them would be to use columnar datastores – databases, which are designed with big data analytics in mind. Sure, they will not help with OLTP type of the traffic but analytics are pretty much standard nowadays as companies try to be data-driven and make decisions based on exact numbers, not random data. There are numerous columnar datastores but we would like to mention here two of those. MariaDB AX and ClickHouse. We have a couple of blogs explaining what MariaDB AX is and how can MariaDB AX be used. What’s important, MariaDB AX can be scaled up in a form of a cluster, improving the performance. ClickHouse is another option for running analytics – ClickHouse can easily be configured to replicate data from MySQL, as we discussed in one of our blog posts. It is fast, it is free and it can also be used to form a cluster and to shard data for even better performance.
We hope that this blog post gave you insights into how large volumes of data can be handled in MySQL or MariaDB. Luckily, there are a couple of options at our disposal and, eventually, if we cannot really make it work, there are good alternatives.