It is not uncommon that developers, network/system administrators, or DevOps folks with general backgrounds, find themselves in a DBA role at some point in their career. So, what does a DBA do? In the previous post, we covered monitoring and trending practices, as well as some popular tools that you might find handy in your day to day work.
We’ll continue this blog series with another basic but crucial DBA responsibility - taking backups of your data. Backup and restore is one of the most important aspects of database administration. If a database crashed and there was no way to recovery it, any resulting data loss might lead to devasting results to a business. One could argue that you can protect against crashes by replicating to multiple servers or data centers. But if it is an application error that propagates to all instances, or a human dropping a part of a database by mistake, you will probably need to restore from backup.
Different backup methodologies
There are multiple ways to take a backup of a MySQL database, but we can divide these methods into two groups - logical and physical.
Logical backups contain data that is exported using SQL commands and stored in a file. This can be, for e.g., a set of SQL commands (INSERTs), that, when executed, will result in restoring a content of the database. It does not have to be SQL code, it can be anything that is restorable - you can as well use SELECT … INTO OUTFILE to generate a file with your database contents. With some modifications to the output file’s syntax, you can store your backup in CSV files.
Physical backups are copies of physical database files. Here, we would make a binary copy of a whole database by, for example, copying all of the files or by making a snapshot of a volume where data directory is located.
A logical backup is usually slower than a physical one, because of the overhead to execute SQL commands to get the data out and then to execute another set of SQL commands to get the data back into the database. This is a severe limitation that tends to prevent the logical backup from being a single backup method on large (high tens or hundreds of gigabytes) databases. On the other hand, a major advantage of the logical backup is the fact that, having all data in the SQL format, you can restore single rows.
Physical backups are not that flexible - while some of the methods make it possible to restore separate tables, you cannot go down to row level. On the other hand, this is a fastest way to backup and restore your database - you are limited only by the performance of your hardware - disk speed and network throughput will be the main limiting factor.
One more important concept, when it comes to the MySQL backup, is point in time recovery. A backup, whether logical or physical, takes place at a given time. This is not enough, you have to be able to restore your database to any point in time, also to a point which happened between the backups. In MySQL, the main way to handle point in time recovery is to use binary logs to replay the workload. With that in mind, a backup is not complete unless you make a copy of the binlogs along with it.
Logical backup methods
The most known method is definitely mysqldump, a CLI tool that enables the DBA to create an SQL dump of the database. Mysqldump is a single-threaded tool and this is its most significant drawback - performance is ok for small databases but it becomes quickly unacceptable if the data set grows to tens of gigabytes. If you plan to use mysqldump as a mean of taking backups, you need to keep a few things in mind. First, by default mysqldump doesn’t include routines and events in its output - you have to explicitly set --routines (-R) and --events (-E) flags. Second, if you want to take a consistent backup then things become tricky. As long as you use InnoDB only, you can use --single-transaction flag and you should be all set. You can also use --apply-slave-statements to get the CHANGE MASTER statements at the beginning of the dump if you plan to create a slave using the backup. If you have other non-transactional tables (MyISAM for example), then mysqldump will have to lock the whole database to ensure consistency. This is a serious drawback and may be one of the reasons why mysqldump won’t work for you.
By default, mysqldump creates a file where you’ll first find SQL to create the schema and then SQL to restore data. To have more flexibility, you may change this behavior and script the backup in such a way that it creates a schema dump first and then the rest of the data. Additionally, you may also want to script the backup process so that it stores separate tables in separate sql files. This will come in handy when you need to restore several rows or to compare current data with the previous day’s data. It’s all about the file size: separate dumps, created per table, will likely to be smaller and more manageable. E.g., in case you want to use a CLI tool to find a given row in the SQL file.
SELECT … INTO OUTFILE
This is more of a way how mysqldump works rather than a separate backup method, but it’s distinct enough to be included here. Mysqldump can be executed in a mode where, instead of SQL syntax, it will generate a backup in some other way. In general, its format is similar to CSV with a difference that the actual format can be defined by the user. By default, it is tab-separated instead of comma-separated.
This format is faster to load than SQL dump (you can use LOAD DATA INFILE to make it happen) but it is also harder to use to restore a single row. Most people probably don’t remember LOAD DATA INFILE syntax, while almost everybody can run SQL.
Those tools work in pair to overcome the main pain-point of mysqldump - single thread. Mydumper can be used to generate a backup of the data (and data only, you need also to use mysqldump --no-data to get a dump of the schema) and then load it. Both processes can use multiple threads. You can either split the workload per table or you can define a size of a chunk and then large tables will also be worked on by numerous threads. It’s still a logical backup so the process may still take a while. Based on numbers reported by different users, mydumper can load data up to 2-3 times faster. The process may still take days, though - depending on the database size, row size etc.
Even if the restore time is not acceptable for your data set, you still may be interested in mydumper because of periodical MySQL upgrades. For any major version upgrade (like 5.5 -> 5.6 or upcoming 5.6 -> 5.7), the recommended way for an upgrade is to perform a logical dump of the data and then load it back up. In such a case, time is not that crucial but it is still much better to finish the restore in 2-3 days using mydumper/myloader rather than 6 - 9 days using mysqldump.
Physical backup methods
Percona’s xtrabackup is the backup method for MySQL. It is a tool that allows the DBA to take a (virtually) non-blocking snapshot of the InnoDB database. It works by copying the data files physically from one volume to another location. You can also stream the backup over the network, to a separate backup host where the backup will be stored. While copying the data, it keeps an eye on the InnoDB redo log and writes down any change that happened in the meantime. At the end, it executes FLUSH TABLES WITH READ LOCK (that’s why we used a word ‘virtually’) and finalizes the backup. Thanks to the last lock, the backup is consistent. If you use MyISAM tables, xtrabackup is more impacting as the non-transactional tables have to be copied over the network while FTWRL is in place - this, depending on the size of those tables, may take a while. During that time, no query will be executed on the host.
Restore is pretty simple - especially if you apply redo logs to the backup taken. Theoretically speaking, you could as well start MySQL without any further actions but then InnoDB recovery will have to be performed at the start. This process takes time. Preparing the backup first (by applying redo logs) can be done in its own time. When the backup needs to be (quickly) restored, you won’t have to go over this process. To speed up the backup preparing phase (using --apply-log) you may increase memory available for xtrabackup using --use-memory flag. As long as you have several gigabytes of free memory, you can use them here to speed up the process significantly.
Xtrabackup is probably the most popular tool out there and it’s not without reason. It is very flexible, you can use multiple threads to copy the files quicker (as long as your hardware permits it), you can use compression to minimize size of the backup. As we mentioned, it is possible to create a backup locally or stream it over the network using (for example) an SSH tunnel or netcat. Xtrabackup allows you to create incremental backups which take significantly less disk space than the full one and won’t take as much time. When restoring, though, it is a slower process as deltas have to be applied one after another and it may take significant amount of time.
Another feature of xtrabackup is its ability to backup single schemas or even tables. It has its uses but also limitations. First of all, it can be used to restore several rows that got dropped accidently. It is still a less efficient way of doing this than restoring that data from an SQL dump, as you’d have to create a separate host, restore the given table, dump missing rows and load them onto the production server - you cannot restore a whole table because you’ll be missing data that happened after the backup was taken. It is possible to work it out with binary logs but it will take too much time to be feasible. On the other hand, if a whole table or schema is missing, you should be able to restore that pretty easily.
The main advantage of the xtrabackup over logical backups is its speed - performance is limited by your disk or network throughput. On the other hand, its much harder to recover single rows from the database. The ideal use case for xtrabackup is to recover a whole host from scratch or provision a new server. It comes with options to store information about MySQL replication or Galera writeset replication along with the backup. This is very useful if you need to provision a new replication slave or a new node in a cluster.
We’ll be talking here about backing up MySQL using snapshots - it does not matter much how you are taking those snapshots. It can be either LVM installed on a host (using LVM is not an uncommon way of setting up MySQL servers) or it could be a “cloudish” snapshot - EBS snapshot or it’s equivalent in your environment. If you use SAN as a storage for your MySQL server and you can generate a snapshot of a volume, it also belongs here. We will focus mostly on the AWS, though - it’s the most popular cloud environment.
In general, snapshots are a great way of backing up any data - it is quick and while it adds some overhead, there are definitely more pros of this method than cons. The main problem with backing up MySQL using the snapshot is consistency - taking a snapshot on the server is comparable to a forced power off. If you run your MySQL server in full durability mode, you should be just fine. If not, it is possible that some of the transactions won’t make it to disk and, as a result, you will lose data. Of course, there are ways of dealing with this issue. First of all, you can change durability settings to more durable (SET GLOBAL innodb_flush_log_at_trx_commit=1, SET GLOBAL sync_binlog=1) prior to the snapshot and then revert back to the original settings after a snapshot has been started. This is the least impacting way of making sure your snapshot is consistent. Another method include stopping a slave (if the replication is the only means of modifying data on a given host) and then run FLUSH TABLES. You can also stop the activity by using FLUSH TABLES WITH READ LOCK to get a consistent state of the database. What is important to keep in mind, though, is that no matter which approach you take, you will end up with data in “crashed” state - if you’d like to use this data to create a new MySQL server, at the first start MySQL will have to perform recovery procedures on InnoDB tables. InnoDB recovery, on the other hand, may take a while, hours even - depending on the amount of modifications.
One way to go around this problem is to take cold backups. As they involve stopping MySQL before taking a snapshot, you can be sure that data is consistent and it’s all just a matter of starting MySQL to get a new server up. No recovery is needed because data came from a server which did a clean shutdown. Of course, stopping MySQL servers is not an ideal way to handle backups but sometimes it is feasible. For example, maybe you have a slave dedicated to ad-hoc queries, executed manually, which does not have to be up all the time? You could use such a server also as a backup host, shutting down MySQL from time to time in order to take a clean snapshot of its data.
As we discussed above, getting a consistent snapshot may be tricky at times. On the pro side, snapshots are a great way of provisioning new instances. This is true especially in the cloud, where you can easily create a new node using a few clicks or API calls. That it’s all true as long as you use a single volume for your data directory. Until recently, to get a decent I/O performance in EC2, the only option was to use multiple EBS volumes and setup a RAID0 over them. It was caused by a limit of how many pIOPS a single EBS instance may have. This limit has increased significantly (to 20k pIOPS), but even now there are still reasons to use RAIDed approach. In such a setup, you can’t just take snapshots and hope for the best - such snapshots will be inconsistent on RAID level, not to mention MySQL level. Cold backup will still work, as MySQL is down and no disk activity should happen (as long as MySQL data directory is located on a separate device). For more “hot” approaches, you may want to look at ec2-consistent-snapshot - a tool that gives you some options how to perform a consistent snapshot of a RAIDed volume with several EBSes under the hood. It can help you to automate some MySQL tasks like stopping a slave and running FLUSH TABLES WITH READ LOCK. It can also freeze the filesystem on operating system level. ec2-consistent-snapshot is tricky to setup and needs detailed tests, but it is one of the options to pick from.
Good practices and guidelines
We covered some ways in which you can take a backup of the MySQL database. It is time to put it together and discuss how you could setup an efficient backup process.
The main problem is that all of the backup methods have their pros and cons. They also have their requirements when it comes to how they affect regular workloads. As usual, how you’d like to make backups depends on the business requirements, environment and resources. We’d still like to share some guidelines with you.
First of all, you want to have an ability to perform point-in-time recovery. It means that you have to copy binary logs along your backup. It can be either copy from disk to disk or EBS snapshot of a volume where binlogs are located - you have to have them available.
Second - you probably want to have an ability to restore single rows. Now, everything depends on your environment. One way would be to take a logical backup of your system but it may be hard to execute on a large data set. On the other hand, if you can restore a database from the physical backup (for example, click to create a new EBS volume out of the snapshot, click to create a new EC2 instance, click to attach EBS to it), you could be just fine with this process and you won’t have to worry about the logical backup at all.
For larger databases you will be forced to use one of the physical backup methods because of the time needed to perform logical one. Next question - how often do you want to perform a backup? You have binary logs so, theoretically speaking, it should be just fine to get a backup once per day and restore the rest of the data from binlogs. In real world, though, replaying binlogs is a slow and painful process. Of course, your mileage may wary - it all depends on the amount of modifications to the database. So, you need to test it - how quickly you can process and replay binary logs in your environment? How it looks like compared to your business requirements which determines maximum allowed downtime? If you use snapshots - how long the recovery process takes? Or, if you use a cold backup approach, how often can you stop your MySQL and take a snapshot? Even on a dedicated instance, you can’t really do it more often than once per 15 - 30 minutes, workload and traffic permitting. Remember, cold backup = replication lag, no matter if you use regular replication or Galera Cluster (in Galera it’s just called differently - node is in Desync state and applying missing writesets after IST). The backup node has to be able to catch up between backups.
Xtrabackup is a great tool for taking backups - using its incremental backup feature, you can easily take deltas every five minutes or so. On the other hand, restoring those increments may take long time and is error-prone - there is a bunch of not yet discovered bugs in both xtrabackup and InnoDB which sometimes corrupts backups and render them useless. If one of the incremental backups is corrupted, the rest will not be usable. This leads us to another important point - how good is the backup data?
You have to test your backups. We mentioned it in a previous post - as a part of the healthcheck you should be checking if the backup, whichever method you choose to use, looks sane. Looking at file sizes is not enough though. From time to time, for example on monthly basis (but again, it depends on your business requirements), you should perform a full restore test - get a test server, install MySQL, restore data from the backup, test that you can join the Galera cluster or slave it off the master. Having backups is not enough - you need to ensure you have working backups.
We hope this introduction to MySQL backup methods will help you find your own solution in safeguarding your data. The main thing to keep in mind is that you should not be afraid of testing - if you don’t know whether your backup process design makes sense, do test it. As long as you have a working backup process that fulfills your organization’s requirements, there is no bad way of designing it. Just remember to test the restore from time to time and ensure you still can restore the database in a timely manner - databases change, their content too. Usually it grows. What was acceptable one year ago may not be acceptable today - you also need to take that under consideration.
- Using s9sbackup to manage your backups - http://www.severalnines.com/blog/simple-backup-management-galera-cluster-using-s9sbackup
- mysqldump or XtraBackup? Backup Strategies for MySQL - http://www.severalnines.com/blog/mysqldump-or-percona-xtrabackup-backup-strategies-mysql-galera-cluster
- Full restore of MySQL Galera Cluster from Backup - http://www.severalnines.com/blog/full-restore-mysql-galera-cluster-backup