Migrating Amazon RDS (MySQL or MariaDB) to an On-Prem Server

Paul Namuag

Amazon Web Services is a technology giant, especially when it comes to pioneering itself in top-of-the-line cloud computing services. Its fully managed services products (Amazon RDS) is one of a kind. But then again, while it can be a perfect platform for some organizations, it can be a challenge to move out from it if it’s not. There are always concerns of being stuck in a vendor lock-in situation.

Some things to keep in mind when migrating from RDS to an on-premise platform are budget constraints, security, and data autonomy. This is because data is your most valuable asset and retaining control wherever it resides, it is always imperative for the organization and company to always remain competitive. No organization can afford to have cloud lock-in, and yet, many enterprises find themselves exactly in that situation and start hopping searching for any alternative existing solutions that can be operable via on-prem.

This blog will walk you through how to migrate from Amazon RDS going to an on-prem server. Our target database on the on-prem server is on a RHEL/CentOS Linux server, but the applicable procedure shall apply on other versions of Linux as well as long as packages are properly installed. 

There are some third-party existing solutions that offer data migration but it's not applicable for an on-premise platform. Additionally, it is not free and migrating using free with open source solutions is always favorable and advantageous. Though doubts and concerns also exist since the warranty and support is not bound with open-source technologies but we will show you here how to achieve this in a straightforward procedure.

Since Amazon RDS supports compatibility with MySQL and MariaDB. We will focus on them for this blog.

Migrating from Amazon RDS for MySQL or MariaDB

A typical approach of migrating your data from Amazon RDS to an on-prem server is to take a backup using a logical copy. This can be done using backup utility solutions that are compatible to operate with Amazon RDS which is a fully-managed service. Fully-managed database services do not offer SSH logins so physical copy of backups is not an option.

Using mysqldump

Using mysqldump has to be installed in your target database node located on-prem. It has to be prepared as a replica of the AWS RDS node so all subsequent transactions shall be replicated to the node. To do this, follow the steps below.

AWS RDS source host: database-1.xxxxxxx.us-east-2.rds.amazonaws.com

On-Prem Server Host: 192.168.10.226 (testnode26)

Before starting the dump, make sure that the binlog retention hours is set. To set it, you can do like the example procedure call below in your Amazon RDS instance,

mysql> call mysql.rds_set_configuration('binlog retention hours', 24);

Query OK, 2 rows affected (0.23 sec)



mysql> CALL mysql.rds_show_configuration;

+------------------------+-------+------------------------------------------------------------------------------------------------------+

| name                   | value | description                                                                                          |

+------------------------+-------+------------------------------------------------------------------------------------------------------+

| binlog retention hours | 24    | binlog retention hours specifies the duration in hours before binary logs are automatically deleted. |

+------------------------+-------+------------------------------------------------------------------------------------------------------+

1 row in set (0.23 sec)



Query OK, 0 rows affected (0.23 sec)

Install mysqldump

  1. Prepare the repository. 

# For MySQL

$ yum install https://dev.mysql.com/get/mysql80-community-release-el7-3.noarch.rpm

# For MariaDB

$ curl -sS https://downloads.mariadb.com/MariaDB/mariadb_repo_setup | sudo bash
  1. Install mysql-client package

# For MySQL

$ yum install -y mysql-community-client.x86_64

# For MariaDB

$ yum install -y MariaDB-client
  1. Create a data dump using mysqldump by executing it inside the target node. Take note, with --master-data=2 specified as an option, this works only for MariaDB but not in MySQL. So extra-work for MySQL has to be done. We'll talk on this later.

## Applicable for MariaDB approach

[[email protected] ~]# mysqldump -h database-1.xxxxxxx.us-east-2.rds.amazonaws.com -uadmin -p --single-transaction --master-data=2 --databases db1 db2 db3  > backups/dump.sql

Enter password:

[[email protected] ~]# ls -alth backups/dump.sql

-rw-r--r--. 1 root root 196M Oct 18 02:34 backups/dump.sql
  1. Install the MySQL/MariaDB Server in the target database node

# For MySQL (always check what version repository is enabled in your yum repository. At this point, I'm using MySQL 5.7)

$ yum --disablerepo=* --enablerepo=mysql57-community install mysql-community-common mysql-community-client mysql-community-server

# For MariaDB

$ yum install MariaDB-server.x86_64
  1. Setup the MySQL/MariaDB Server instance (my.cnf, file permissions, directories), and start the server 

# Setting up the my.cnf (using the my.cnf deployment use by ClusterControl)

[MYSQLD]

user=mysql

basedir=/usr/

datadir=/var/lib/mysql

socket=/var/lib/mysql/mysql.sock

pid_file=/var/lib/mysql/mysql.pid

port=3306

log_error=/var/log/mysql/mysqld.log

log_warnings=2

slow_query_log_file=/var/log/mysql/mysql-slow.log

long_query_time=2

slow_query_log=OFF

log_queries_not_using_indexes=OFF

innodb_buffer_pool_size=2G

innodb_flush_log_at_trx_commit=2

innodb_file_per_table=1

innodb_data_file_path=ibdata1:100M:autoextend

innodb_read_io_threads=4

innodb_write_io_threads=4

innodb_doublewrite=1

innodb_log_file_size=256M

innodb_log_buffer_size=32M

innodb_buffer_pool_instances=1

innodb_log_files_in_group=2

innodb_thread_concurrency=0

innodb_flush_method=O_DIRECT

innodb_rollback_on_timeout=ON

innodb_autoinc_lock_mode=2

innodb_stats_on_metadata=0

default_storage_engine=innodb

server_id=1126

binlog_format=ROW

log_bin=binlog

log_slave_updates=1

relay_log=relay-bin

expire_logs_days=7

read_only=OFF

report_host=192.168.10.226

key_buffer_size=24M

tmp_table_size=64M

max_heap_table_size=64M

max_allowed_packet=512M

skip_name_resolve=true

memlock=0

sysdate_is_now=1

max_connections=500

thread_cache_size=512

query_cache_type=0

query_cache_size=0

table_open_cache=1024

lower_case_table_names=0

performance_schema=OFF

performance-schema-max-mutex-classes=0

performance-schema-max-mutex-instances=0



[MYSQL]

socket=/var/lib/mysql/mysql.sock



[client]

socket=/var/lib/mysql/mysql.sock



[mysqldump]

socket=/var/lib/mysql/mysql.sock

max_allowed_packet=512M

            ## Reset the data directory and re-install the database system files

$ rm -rf /var/lib/mysql/*

## Create the log directories

$ mkdir /var/log/mysql

$ chown -R mysql.mysql /var/log/mysql

## For MySQL

$ mysqld --initialize

## For MariaDB

$ mysql_install_db

 

  1. Start the MySQL/MariaDB Server

## For MySQL

$ systemctl start mysqld

## For MariaDB

$ systemctl start mariadb
  1. Load the data dump we have taken from AWS RDS to the target database node on-prem

$ mysql --show-warnings < backups/dump.sql
  1. Create the replication user from the AWS RDS source node

MariaDB [(none)]> CREATE USER 'repl_user'@'149.145.213.%' IDENTIFIED BY 'repl_passw0rd';

Query OK, 0 rows affected (0.242 sec)



MariaDB [(none)]>  GRANT REPLICATION CLIENT, REPLICATION SLAVE ON *.* TO repl_user'@'149.145.213.%'  IDENTIFIED BY 'repl_passw0rd' ;

Query OK, 0 rows affected (0.229 sec)
  1. Set up the MySQL/MariaDB Server as a replica/slave of the AWS RDS source node

## First, let's search or locate the CHANGE MASTER command

[[email protected] ~]# grep -rn -E -i 'change master to master' backups/dump.sql |head -1

22:-- CHANGE MASTER TO MASTER_LOG_FILE='mysql-bin-changelog.000584', MASTER_LOG_POS=421;

## Run the CHANGE MASTER statement but add the replication user/password and the hostname as follows,

MariaDB [(none)]> CHANGE MASTER TO MASTER_HOST='database-1.xxxxxxx.us-east-2.rds.amazonaws.com', MASTER_LOG_FILE='mysql-bin-changelog.000584', MASTER_LOG_POS=421, MASTER_USER='repl_user', MASTER_PASSWORD='repl_passw0rd';

Query OK, 0 rows affected (0.004 sec)

## Then start the slave threads

MariaDB [(none)]> START SLAVE;

Query OK, 0 rows affected (0.001 sec)

## Check the slave status how it goes

MariaDB [(none)]> SHOW SLAVE STATUS \G

*************************** 1. row ***************************

                Slave_IO_State: Waiting for master to send event

                   Master_Host: database-1.xxxxxxx.us-east-2.rds.amazonaws.com

                   Master_User: repl_user

                   Master_Port: 3306

                 Connect_Retry: 60

               Master_Log_File: mysql-bin-changelog.000584

           Read_Master_Log_Pos: 421

                Relay_Log_File: relay-bin.000001

                 Relay_Log_Pos: 4

         Relay_Master_Log_File: mysql-bin-changelog.000584

              Slave_IO_Running: Yes

             Slave_SQL_Running: Yes

               Replicate_Do_DB:

           Replicate_Ignore_DB:

            Replicate_Do_Table:

        Replicate_Ignore_Table:

       Replicate_Wild_Do_Table:

   Replicate_Wild_Ignore_Table:

                    Last_Errno: 0

                    Last_Error:

                  Skip_Counter: 0

           Exec_Master_Log_Pos: 421

               Relay_Log_Space: 256

               Until_Condition: None

                Until_Log_File:

                 Until_Log_Pos: 0

            Master_SSL_Allowed: No

            Master_SSL_CA_File:

            Master_SSL_CA_Path:

               Master_SSL_Cert:

             Master_SSL_Cipher:

                Master_SSL_Key:

         Seconds_Behind_Master: 0

 Master_SSL_Verify_Server_Cert: No

                 Last_IO_Errno: 0

                 Last_IO_Error:

                Last_SQL_Errno: 0

                Last_SQL_Error:

   Replicate_Ignore_Server_Ids:

              Master_Server_Id: 1675507089

                Master_SSL_Crl:

            Master_SSL_Crlpath:

                    Using_Gtid: No

                   Gtid_IO_Pos:

       Replicate_Do_Domain_Ids:

   Replicate_Ignore_Domain_Ids:

                 Parallel_Mode: optimistic

                     SQL_Delay: 0

           SQL_Remaining_Delay: NULL

       Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates

              Slave_DDL_Groups: 0

Slave_Non_Transactional_Groups: 0

    Slave_Transactional_Groups: 0

1 row in set (0.000 sec)

Now that we have finally been able to replicate from RDS as the source or master of our replica located on-prem. It's not done yet. There are some cases you'll encounter replication errors such as,      

Last_SQL_Errno: 1146

                Last_SQL_Error: Error 'Table 'mysql.rds_heartbeat2' doesn't exist' on query. Default database: 'mysql'. Query: 'INSERT INTO mysql.rds_heartbeat2(id, value) values (1,1602988485784) ON DUPLICATE KEY UPDATE value = 1602988485784'

 Since on-prem does not need to replicate data coming from mysql database for tables prefixed with 'rds%', then we just ignore these tables during replication. Additionally, you might not want AWS RDS to update and change your mysql.user table. To do this, you can optionally ignore the schema or just list of tables such as,

STOP SLAVE;

Then,

SET GLOBAL replicate_wild_ignore_table='mysql.rds%';

or

SET GLOBAL replicate_wild_ignore_table='mysql.%';

The MySQL Problem With --master-data=2

Taking the mysqldump with --master-data=2 requires sufficient privileges which requires SUPER and RELOAD privileges. The problem is, AWS RDS does not supply this for the admin user during database setup and creation. In order to workaround this issue, it must be that your AWS RDS has a master and a replica or slave setup. Once you have a slave setup, take that as the target source host when taking mysqldump. Then stop the slave threads from your AWS RDS replica as follows,

rds-replica-mysql> CALL mysql.rds_stop_replication;

Then take the mysqldump without the --master-data option just like below,

mysqldump -h database-1.xxxxxxx.us-east-2.rds.amazonaws.com -uadmin -p --single-transaction --databases db1 db2 db3  > backups/dump.sql

Then run the SHOW SLAVE STATUS\G from your AWS RDS replica and take note of the  Master_Log_File and Exec_Master_Log_Pos for which you will use when connecting to the AWS RDS master replicating to your on-prem server. Use those coordinates when running CHANGE MASTER TO… MASTER_LOG_FILE=Master_Log_File, MASTER_LOG_POS=<Exec_Master_Log_Pos>. Of course, once the backup has been done, do not forget to start your RDS replica to start its replication threads again,

rds-replica-mysql> CALL mysql.rds_start_replication;

Using mydumper

mydumper can be your alternative option here especially when the dataset is very large as it offers parallelism and speed when taking a dump or backup copy of your dataset from a source RDS node. Follow the steps below from installing the mydumper to loading it to your destination on-prem server.

  1. Install the binary. The binaries can be located here https://github.com/maxbube/mydumper/releases.

 $ yum install https://github.com/maxbube/mydumper/releases/download/v0.9.5/mydumper-0.9.5-2.el6.x86_64.rpm
  1. Take the backup from the RDS source node. For example,

[[email protected] mydumper-2]# /usr/bin/mydumper --outputdir=. --verbose=3 --host=database-1.xxxxxxx.us-east-2.rds.amazonaws.com --port=3306 --kill-long-queries --chunk-filesize=5120 --build-empty-files --events --routines --triggers --compress --less-locking --success-on-1146 --regex='(db1\.|db2\.|db3\.|mydb4\.|testdb5\.)' -u admin --password=admin123

** Message: Connected to a MySQL server



** (mydumper:18904): CRITICAL **: Couldn't acquire global lock, snapshots will not be consistent: Access denied for user 'admin'@'%' (using password: YES)

** Message: Started dump at: 2020-10-18 09:34:08



** Message: Written master status

** Message: Multisource slave detected.

** Message: Thread 5 connected using MySQL connection ID 1109

Now, at this point, mydumper will take a backup files in the form of *.gz files

  1. Load it to your destination on-premise server

$ myloader --host localhost --directory=$(pwd) --queries-per-transaction=10000 --threads=8 --compress-protocol --verbose=3

** Message: 8 threads created

** Message: Creating database `db1`

** Message: Creating table `db1`.`folders_rel`

** Message: Creating table `db2`.`p`

** Message: Creating table `db2`.`t1`

** Message: Creating table `db3`.`AddressCodeTest`
  1. Setup the destination node as a slave/replica. mydumper will include a file called metadata which consists of binary log coordinates including GTID positions, for example:                                                                                                                                                                         

$ cat metadata

Started dump at: 2020-10-18 10:23:35

SHOW MASTER STATUS:

        Log: mysql-bin-changelog.000680

        Pos: 676

        GTID:0-1675507089-3044

## Then run a change master from the replica or your target destination MySQL/MariaDB database node

MariaDB [jbmrcd_date]> CHANGE MASTER TO MASTER_HOST='database-1.cmu8qdlvkepg.us-east-2.rds.amazonaws.com', MASTER_USER='repl_user', MASTER_PASSWORD='repl_passw0rd',  MASTER_LOG_FILE='mysql-bin-changelog.000680', MASTER_LOG_POS

=676;

Query OK, 0 rows affected (0.002 sec)

## Start the slave

MariaDB [jbmrcd_date]> start slave;

Query OK, 0 rows affected (0.001 sec)

At this point, you have now replicated from an Amazon RDS instance running MySQL/MariaDB. Once your application is ready to move away from your Amazon RDS instance, setup the endpoint going to your on-prem server and all remaining transactions from your RDS instance will be replicated to your on-prem leaving no data being missed going to your on-prem server.

Check For Data Discrepancies

Once you have your data loaded or dumped to your on-prem server acting as a replica from the AWS RDS instance, you should double check this by running checksum calculations to determine how far your data is against the source Amazon RDS. I suggest you use pt-table-checksum tool by Percona, but you can create your own though by using checksumming tools such as md5 or sha256 but this takes time to do. Additionally, using pt-upgrade can help as well after your data migration using this replication approach is done.

Conclusion

Using mysqldump or mydumper are free open-source tools which is a great advantage as well especially if your data is very confidential and you do not want a third-party to access it. Although it might be simple to take this approach, there can be tedious and large work that can be involved as testing and double checks always follow in order to prove that migration is fully achieved without any data inconsistencies.

ClusterControl
The only management system you’ll ever need to take control of your open source database infrastructure.