All companies nowadays have (or should have) a Disaster Recovery Plan (DRP) to prevent data loss in the case of failure; built according to an acceptable Recovery Point Objective (RPO) for the business.
A backup is a basic start in any DRP, but to guarantee the backup usability a single backup is just not enough. The best practice is to store the backup files in three different places, one stored locally on the database server (for faster recovery), another one in a centralized backup server, and the last one the cloud. For this last step, you should choose a stable and robust cloud provider to make sure your data is stored correctly and is accessible at any time.
In this blog, we will take a look at one of the most famous cloud providers, Google Cloud Platform (GCP) and how to use it to store your PostgreSQL backups in the cloud.
About Google Cloud
Google Cloud offers a wide range of products for your workload. Let’s look at some of them and how they are related to storing PostgreSQL backups in the cloud.
- Cloud Storage: It allows for world-wide storage and retrieval of any amount of data at any time. You can use Cloud Storage for a range of scenarios including serving website content, storing data for archival and disaster recovery, or distributing large data objects to users via direct download.
- Cloud SQL: It’s a fully managed database service that makes it easy to set up, maintain, manage, and administer your relational PostgreSQL, MySQL, and SQL Server databases in the cloud.
- Compute Engine: It delivers virtual machines running in Google Cloud with support to scaling from single instances to global, load-balanced cloud computing. Compute Engine’s VMs boot quickly, come with high-performance persistent and local disk options, and deliver consistent performance.
Storing Backups on Google Cloud
If you’re running your PostgreSQL database on Google Cloud with Cloud SQL you can back it up directly from the Google Cloud Platform, however, it’s not necessary to run it here to store your PostgreSQL backups.
Google Cloud Storage
Similar to the well-known Amazon S3 product, if you’re not running your PostgreSQL database with Cloud SQL, this is the most commonly used option to store backups or files in Google Cloud. It’s accessible from the Google Cloud Platform, in the Getting Started section or under the Storage left menu. With Cloud Storage, you can even easily transfer your S3 content here using the Transfer feature.
How to Use Google Cloud Storage
First, you need to create a new Bucket to store your data, so go to Google Cloud Platform -> Storage -> Create Bucket
In the first step, you need to just add a new bucket name.
In the next step, you can specify the location type (multi-region by default) and the location place.
Then, you can change the storage class from standard (default option) to nearline or coldline.
And then, you can change the control access.
Finally, you have some optional settings like encryption or retention policy.
Now you have your new bucket created, we will see how to use it.
Using the GSutil Tool
GSutil is a Python application that lets you access Cloud Storage from the command line. It allows you to perform different bucket and object management tasks. Let’s see how to install it on CentOS 7 and how to upload a backup using it.
Download Cloud SDK:
$ curl https://sdk.cloud.google.com | bash
Restart your shell:
$ exec -l $SHELL
Run gcloud init and configure the tool:
$ gcloud init
This command will ask you to login to your Google Cloud account by accessing a URL and adding an authentication code.
Now you have the tool installed and configured, let’s upload a backup to the bucket.
First, let’s check our buckets created:
[root@PG1bkp ~]# gsutil ls
And to copy your PostgreSQL backup (or another file), run:
[root@PG1bkp ~]# gsutil cp /root/backups/BACKUP-3/base.tar.gz gs://pgbackups1/new_backup/
Copying file:///root/backups/BACKUP-3/base.tar.gz [Content-Type=application/x-tar]...
| [1 files][ 4.9 MiB/ 4.9 MiB]
Operation completed over 1 objects/4.9 MiB.
The destination bucket must exist.
And then, you can list the contents of the new_backup directory, to check the file uploaded:
[root@PG1bkp ~]# gsutil ls -r gs://pgbackups1/new_backup/*
For more information about the GSutil usage, you can check the official documentation.
Google Cloud SQL
If you want to centralize all the environment (database + backups) into Google Cloud, you have available this Cloud SQL product. In this way, you will have your PostgreSQL database running on Google Cloud and you can also manage the backups from the same platform. It’s accessible from the Google Cloud Platform, in the Getting started section or under the Storage left menu.
How to Use Google Cloud SQL
To create a new PostgreSQL instance, go to Google Cloud Platform -> SQL -> Create Instance
Here you can choose between MySQL and PostgreSQL as the database engine. For this blog, let’s create a PostgreSQL instance.
Now, you need to add an instance ID, password, location and PostgreSQL version (9.6 or 11).
You have also some configuration options, like enable Public IP Address, Machine type and storage, and backups, etc.
When the Cloud SQL instance is created, you can select it and you will see an overview of this new instance.
And you can go to the Backups section to manage your PostgreSQL backups.
To reduce storage costs, backups work incrementally. Each backup stores only the changes to your data since the previous backup.
Google Cloud Compute Engine
Similar to Amazon EC2, this way to store information in the cloud is more expensive and time-consuming than Cloud Storage, but you will have full control over the backup storage environment. It’s also accessible from the Google Cloud Platform, in the Getting started section or under the Compute left menu.
How to Use a Google Cloud Compute Engine
To create a new virtual machine, go to Google Cloud Platform -> Compute Engine -> Create Instance
Here you need to add an instance name, region, and zone where to create it. Also, you need to specify the machine configuration according to your hardware and usage requirements, and the disk size and operating system to use for the new virtual machine.
When the instance is ready, you can store the backups here, for example, sending it via SSH or FTP using the external IP Address. Let’s look at an example with Rsync and another one with SCP Linux command.
To connect via SSH to the new virtual machine, make sure you have added your SSH key in the virtual machine configuration.
[root@PG1bkp ~]# rsync -avzP -e "ssh -i /home/sinsausti/.ssh/id_rsa" /root/backups/BACKUP-3/base.tar.gz [email protected]:/home/sinsausti/pgbackups/
sending incremental file list
5,155,420 100% 1.86MB/s 0:00:02 (xfr#1, to-chk=0/1)
sent 4,719,597 bytes received 35 bytes 629,284.27 bytes/sec
total size is 5,155,420 speedup is 1.09
[root@PG1bkp ~]# scp -i /home/sinsausti/.ssh/id_rsa /root/backups/BACKUP-5/base.tar.gz [email protected]:/home/sinsausti/pgbackups/
base.tar.gz 100% 2905KB 968.2KB/s 00:03
You can easily embed this into a script to perform an automatic backup process or use this product with an external system like ClusterControl to manage your backups.
Managing Your Backups with ClusterControl
In the same way that you can centralize the management for both database and backup from the same platform by using Cloud SQL, you can use ClusterControl for several management tasks related to your PostgreSQL database.
ClusterControl is a comprehensive management system for open source databases that automates deployment and management functions, as well as health and performance monitoring. ClusterControl supports deployment, management, monitoring and scaling for different database technologies and environments. So, you can, for example, create our Virtual Machine instance on Google Cloud, and deploy/import our database service with ClusterControl.
Creating a Backup
For this task, go to ClusterControl -> Select Cluster -> Backup -> Create Backup.
You can create a new backup or configure a scheduled one. For our example, we will create a single backup instantly.
You must choose one method, the server from which the backup will be taken, and where you want to store the backup. You can also upload our backup to the cloud (AWS, Google or Azure) by enabling the corresponding button.
Then specify the use of compression, the compression level, encryption and retention period for your backup.
If you enabled the upload backup to the cloud option, you will see a section to specify the cloud provider (in this case Google Cloud) and the credentials (ClusterControl -> Integrations -> Cloud Providers). For Google Cloud, it uses Cloud Storage, so you must select a Bucket or even create a new one to store your backups.
On the backup section, you can see the progress of the backup, and information like method, size, location, and more.
Google Cloud may be a good option to store your PostgreSQL backups and it offers different products to make this. It’s not, however, necessary to have your PostgreSQL databases running there as you can use it only as a storage location.
The GSutil tool is a nice product for managing your Cloud Storage data from the command line, easy-to-use and fast.
You can also combine Google Cloud and ClusterControl to improve your PostgreSQL high availability environment and monitoring system. If you want to know more about PostgreSQL on Google Cloud you can check our deep dive blog post.