It is not uncommon for developers, network/system administrators, or DevOps folks with general backgrounds to find themselves in a DBA role at some point in their careers. So, what does a DBA do? In the previous post, we covered monitoring and trending practices and some popular tools that you might find handy in your day-to-day work.
We’ll continue this blog series with another basic but crucial DBA responsibility – taking backups of your data. Backup and restoration are one of the most important aspects of database administration. If a database crashed and there was no way to recover it, any resulting data loss might lead to devasting results for a business. One could argue that you can protect against crashes by replicating to multiple servers or data centers. But if it is an application error that propagates to all instances or a human dropping a part of a database by mistake, you will probably need to restore from backup.
Different Backup Methodologies
There are multiple ways to take a backup of a MySQL database, but we can divide these methods into two groups – logical and physical.
Logical backups contain data that is exported using SQL commands and stored in a file. This can be, e.g., a set of SQL commands (INSERTs) that, when executed, will restore the database’s content. It does not have to be SQL code. It can be anything that is restorable – you can also use SELECT … INTO OUTFILE to generate a file with your database contents. You can store your backup in CSV files with some modifications to the output file’s syntax.
Physical backups are copies of physical database files. Here, we would make a binary copy of a whole database by, for example, copying all of the files or by making a snapshot of a volume where the data directory is located.
A logical backup is usually slower than a physical one because of the overhead of executing SQL commands to get the data out and then executing another set of SQL commands to get the data back into the database. This severe limitation prevents the logical backup from being a single backup method on large (high tens or hundreds of gigabytes) databases. On the other hand, a major advantage of the logical backup is the fact that having all data in the SQL format, you can restore single rows.
Physical backups are not that flexible – while some of the methods make it possible to restore separate tables, you cannot go down to row level. On the other hand, this is the fastest way to backup and restore your database – you are limited only by the performance of your hardware – disk speed and network throughput will be the main limiting factor. One more important concept when it comes to the MySQL backup is point-in-time recovery. A logical or physical backup takes place at a given time. This is not enough, you have to be able to restore your database to any point in time, also to a point that happened between the backups. In MySQL, the main way to handle point-in-time recovery is to use by Mini – Adblocker”> binary logs to replay the workload. With that in mind, a backup is not complete unless you make a copy of the binlogs along with it.
Logical Backup Methods
The most known method is mysqldump, a CLI tool that enables the DBA to create an SQL dump of the database. Mysqldump is a single-threaded tool, and this is its most significant drawback – performance is ok for small databases, but it becomes quickly unacceptable if the data set grows to tens of gigabytes. If you plan to use mysqldump to do backups, you must keep a few things in mind. First, by default mysqldump doesn’t include routines and events in its output – you have to explicitly set –routines (-R) and –events (-E) flags. Second, if you want a consistent backup, things become tricky. As long as you use InnoDB only, you can use –single-transaction flag and you should be all set. You can also use –apply-slave-statements to get the CHANGE MASTER statements at the beginning of the dump if you plan to create a slave using the backup. If you have other non-transactional tables (MyISAM for example), then mysqldump will have to lock the whole database to ensure consistency. This is a serious drawback and may be one of the reasons why mysqldump won’t work for you.
By default, mysqldump creates a file where you’ll first find SQL to generate the schema and then SQL to restore data. To have more flexibility, you may change this behavior and script the backup in such a way that it creates a schema dump first and then the rest of the data. Additionally, you may also want to script the backup process so that it stores separate tables in separate sql files. This will be handy when you must restore several rows or compare current data with the previous day’s data. It’s all about the file size: separate dumps created per table will likely be smaller and more manageable. E.g., in case you want to use a CLI tool to find a given row in the SQL file.
SELECT … INTO OUTFILE
This is more of a way how mysqldump works rather than a separate backup method, but it’s distinct enough to be included here. Mysqldump can be executed in a mode where, instead of SQL syntax, it will generate a backup in another way. In general, its format is similar to CSV with the difference that the actual format can be defined by the user. By default, it is tab-separated instead of comma-separated. This format is faster to load than SQL dump (you can use LOAD DATA INFILE to make it happen), but it is also harder to use to restore a single row. Most people probably don’t remember LOAD DATA INFILE syntax, while almost everybody can run SQL.
Those tools work in a pair to overcome the main pain point of mysqldump – single thread. Mydumper can be used to generate a backup of the data (and data only, you need also to use mysqldump –no-data to get a dump of the schema) and then load it. Both processes can use multiple threads. You can either split the workload per table, or you can define a chunk size, and then large tables will also be worked on by numerous threads. It’s still a logical backup, so the process may still take a while. Based on numbers reported by different users, mydumper can load data up to 2-3 times faster. The process may still take days, though – depending on the database size, row size etc.
Even if the restore time is not acceptable for your data set, you still may be interested in mydumper because of periodical MySQL upgrades. For any major version upgrade (like 5.5 -> 5.6 or upcoming 5.6 -> 5.7), the recommended way for an upgrade is to perform a logical dump of the data and then load it back up. In such a case, time is not that crucial, but it is still much better to finish the restore in 2-3 days using mydumper/myloader rather than 6 – 9 days using mysqldump.
Physical Backup Methods
Percona’s xtrabackup is the backup method for MySQL. It is a tool that allows the DBA to take a (virtually) non-blocking snapshot of the InnoDB database. It works by copying the data files physically from one volume to another location. You can also stream the backup over the network, to a separate backup host where the backup will be stored. While copying the data, it keeps an eye on the InnoDB redo log and writes down any change that happened in the meantime. In the end, it executes FLUSH TABLES WITH READ LOCK (that’s why we used the word ‘virtually’) and finalizes the backup. Thanks to the last lock, the backup is consistent. If you use MyISAM tables, xtrabackup is more impacting as the non-transactional tables have to be copied over the network while FTWRL is in place – depending on the size of those tables, may take a while. During that time, no query will be executed on the host.
Restore is pretty simple – especially if you apply redo logs to the backup taken. Theoretically speaking, you could as well start MySQL without any further actions, but then InnoDB recovery will have to be performed at the start. This process takes time. Preparing the backup first (by applying redo logs) can be done in its own time. When the backup needs to be (quickly) restored, you won’t have to go over this process. To speed up the backup preparing phase (using –apply-log) you may increase the memory available for xtrabackup using –use-memory flag. As long as you have several gigabytes of free memory, you can use them here to speed up the process significantly.
Xtrabackup is probably the most popular tool out there and it’s not without reason. It is very flexible, you can use multiple threads to copy the files quicker (as long as your hardware permits), and you can use compression to minimize the backup size. As we mentioned, it is possible to create a backup locally or stream it over the network using (for example) an SSH tunnel or netcat. Xtrabackup allows you to create incremental backups which take significantly less disk space than the full one and won’t take as much time. When restoring, though, it is a slower process as deltas must be applied one after another, which may take a significant amount of time.
Another feature of xtrabackup is its ability to backup single schemas or even tables. It has its uses but also limitations. First of all, it can be used to restore several rows that got dropped accidentally. It is still a less efficient way of doing this than restoring that data from an SQL dump, as you’d have to create a separate host, restore the given table, dump missing rows and load them onto the production server – you cannot restore a whole table because you’ll be missing data that happened after the backup was taken. It is possible to work it out with by Mini – Adblocker”> binary logs, but it will take too much time to be feasible. On the other hand, if a whole table or schema is missing, you should be able to restore that pretty easily.
The main advantage of the xtrabackup over logical backups is its speed – performance is limited by your disk or network throughput. On the other hand, it’s much harder to recover single rows from the database. The ideal use case for xtrabackup is to recover a whole host from scratch or provision a new server. It comes with options to store information about MySQL replication or Galera writeset replication along with the backup. This is very useful if you need to provision a new replication slave or a new node in a cluster.
We’ll be talking here about backing up MySQL using snapshots – it does not matter much how you are taking those snapshots. It can be either LVM installed on a host (using LVM is not an uncommon way of setting up MySQL servers), or it could be a “cloudish” snapshot – EBS snapshot, or its equivalent in your environment. If you use SAN as a storage for your MySQL server and you can generate a snapshot of a volume, it also belongs here. We will focus mostly on the AWS, though – it’s the most popular cloud environment.
In general, snapshots are a great way of backing up any data – it is quick, and while it adds some overhead, there are more pros of this method than cons. The main problem with backing up MySQL using the snapshot is consistency – taking a snapshot on the server is comparable to a forced power off. You should be fine if you run your MySQL server in full durability mode. If not, it is possible that some of the transactions won’t make it to disk and, as a result, you will lose data. Of course, there are ways of dealing with this issue. First of all, you can change durability settings to more durable (SET GLOBAL innodb_flush_log_at_trx_commit=1, SET GLOBAL sync_binlog=1) prior to the snapshot and then revert to the original settings after a snapshot has been started. This is the least impacting way of making sure your snapshot is consistent. Another method includes stopping a slave (if the replication is the only means of modifying data on a given host) and then running FLUSH TABLES. You can also stop the activity by using FLUSH TABLES WITH READ LOCK to get a consistent database state. What is important to keep in mind, though, is that no matter which approaches you take, you will end up with data in a “crashed” state – if you’d like to use this data to create a new MySQL server, at the first start MySQL will have to perform recovery procedures on InnoDB tables. On the other hand, InnoDB recovery may take a while, hours even – depending on the amount of modifications.
One way to go around this problem is to take cold backups. As they involve stopping MySQL before taking a snapshot, you can be sure that data is consistent, and it’s all just a matter of starting MySQL to get a new server up. No recovery is needed because data came from a server that did a clean shutdown. Of course, stopping MySQL servers is not an ideal way to handle backups, but sometimes it is feasible. For example, maybe you have a slave dedicated to ad-hoc queries, executed manually, which does not have to be up all the time. You could also use such a server as a backup host, shutting down MySQL from time to time to take a clean snapshot of its data.As we discussed above, getting a consistent snapshot may be tricky sometimes. On the pro side, snapshots are a great way of provisioning new instances. This is true, especially in the cloud, where you can easily create a new node using a few clicks or API calls. That it’s all true as long as you use a single volume for your data directory. Until recently, to get a decent I/O performance in EC2, the by Mini – Adblocker”> only option was to use multiple EBS volumes and set up a RAID0 over them. It was caused by a limit of how many pIOPS a single EBS instance may have. This limit has increased significantly (to 20k pIOPS), but even now, there are still reasons to use RAIDed approach. In such a setup, you can’t just take snapshots and hope for the best – such snapshots will be inconsistent on RAID level, not to mention MySQL level. Cold backup will still work, as MySQL is down, and no disk activity should happen (as long as MySQL data directory is located on a separate device). For more “hot” approaches, you may want to look at ec2-consistent-snapshot – a tool that gives you some options on how to perform a consistent snapshot of a RAIDed volume with several EBSes under the hood. It can help you to automate some MySQL tasks like stopping a slave and running FLUSH TABLES WITH READ LOCK. It can also freeze the filesystem on the operating system level. ec2-consistent-snapshot is tricky to set up and needs detailed tests, but it is one of the options to pick from.
Best Practices and Guidelines
We covered some ways in which you can take a backup of the MySQL database. It is time to put it together and discuss how you could set up an efficient backup process.
The main problem is that all backup methods have pros and cons. They also have their requirements regarding how they affect regular workloads. As usual, how you’d like to make backups depends on the business requirements, environment, and resources. We’d still like to share some guidelines with you.
First of all, you want to have the ability to perform point-in-time recovery. It means that you have to copy by Mini – Adblocker”> binary logs along your backup. It can be either copied from disk to disk or EBS snapshot of a volume where binlogs are located – you have to have them available.
Second – you probably want to have the ability to restore single rows. Now, everything depends on your environment. One way would be to take a logical backup of your system but it may be hard to execute on a large data set. On the other hand, if you can restore a database from the physical backup (for example, click to create a new EBS volume out of the snapshot, click to create a new EC2 instance, click to attach EBS to it), you could be just fine with this process, and you won’t have to worry about the logical backup at all.
For larger databases, you will be forced to use one of the physical backup methods because of the time needed to perform a logical one. Next question – how often do you want to perform a backup? You have by Mini – Adblocker”> binary logs so, theoretically speaking, it should be just fine to get a backup once per day and restore the rest of the data from binlogs. In the real world, replaying binlogs is a slow and painful process. Of course, your mileage may be wary – it all depends on the number of modifications to the database. So, you need to test it – how quickly you can process and replay binary logs in your environment? What does it look like compared to your business requirements to determine the maximum allowed downtime? If you use snapshots – how long the recovery process takes? Or, how often can you stop your MySQL and take a snapshot if you use a cold backup approach? Even on a dedicated instance, you can’t do it more often than once per 15 – 30 minutes, workload and traffic permitting. Remember, cold backup = replication lag, no matter if you use regular replication or Galera Cluster (in Galera it’s just called differently – node is in Desync state and applying missing writesets after IST). The backup node has to be able to catch up between backups.
Xtrabackup is a great tool for taking backups – using its incremental backup feature, you can easily take deltas every five minutes or so. On the other hand, restoring those increments may take a long time and is error-prone – there is a bunch of not-yet discovered bugs in both xtrabackup and InnoDB, which sometimes corrupts backups and render them useless. If one of the incremental backups is corrupted, the rest will not be usable. This leads us to another important point – how good is the backup data?
You have to test your backups. We mentioned it in a previous post – as a part of the health check you should check if the backup, whichever method you choose, looks sane. Looking at file sizes is not enough, though. From time to time, for example, on monthly basis (but again, it depends on your business requirements), you should perform a full restore test – get a test server, install MySQL, restore data from the backup, and test that you can join the Galera cluster or slave it off the master. Having backups is not enough – you must ensure working backups.
We hope this introduction to MySQL backup methods will help you find your solution for safeguarding your data. The main thing to remember is that you should not be afraid of testing – if you don’t know whether your backup process design makes sense, do test it. As long as you have a working backup process that fulfills your organization’s requirements, there is no bad way of designing it. Just remember to test the restore from time to time and ensure you still can restore the database promptly – databases change their content too. Usually, it grows. What was acceptable one year ago may not be sufficient today – you also need to consider that.
Hopefully, the above comprehensive post has given you a good overview of MySQL backup and restore best practices. Read more here about the process of a full restore of a MySQL or MariaDB Galera cluster from backup.