Tips for Storing PostgreSQL Backups on Google Cloud (GCP)

Sebastian Insausti

All companies nowadays have (or should have) a Disaster Recovery Plan (DRP) to prevent data loss in the case of failure; built according to an acceptable Recovery Point Objective (RPO) for the business.

A backup is a basic start in any DRP, but to guarantee the backup usability a single backup is just not enough. The best practice is to store the backup files in three different places, one stored locally on the database server (for faster recovery), another one in a centralized backup server, and the last one the cloud. For this last step, you should choose a stable and robust cloud provider to make sure your data is stored correctly and is accessible at any time.

In this blog, we will take a look at one of the most famous cloud providers, Google Cloud Platform (GCP) and how to use it to store your PostgreSQL backups in the cloud.

About Google Cloud

Google Cloud offers a wide range of products for your workload. Let’s look at some of them and how they are related to storing PostgreSQL backups in the cloud.

  • Cloud Storage: It allows for world-wide storage and retrieval of any amount of data at any time. You can use Cloud Storage for a range of scenarios including serving website content, storing data for archival and disaster recovery, or distributing large data objects to users via direct download.
  • Cloud SQL: It’s a fully managed database service that makes it easy to set up, maintain, manage, and administer your relational PostgreSQL, MySQL, and SQL Server databases in the cloud.
  • Compute Engine: It delivers virtual machines running in Google Cloud with support to scaling from single instances to global, load-balanced cloud computing. Compute Engine's VMs boot quickly, come with high-performance persistent and local disk options, and deliver consistent performance. 

Storing Backups on Google Cloud

If you’re running your PostgreSQL database on Google Cloud with Cloud SQL you can back it up directly from the Google Cloud Platform, however, it’s not necessary to run it here to store your PostgreSQL backups.

Google Cloud Platform

Google Cloud Storage

Similar to the well-known Amazon S3 product, if you’re not running your PostgreSQL database with Cloud SQL, this is the most commonly used option to store backups or files in Google Cloud. It’s accessible from the Google Cloud Platform, in the Getting Started section or under the Storage left menu. With Cloud Storage, you can even easily transfer your S3 content here using the Transfer feature.

How to Use Google Cloud Storage

First, you need to create a new Bucket to store your data, so go to Google Cloud Platform -> Storage -> Create Bucket

Name Your Bucket - Google Cloud

In the first step, you need to just add a new bucket name.

Choose Where to Store Your Data - Google Cloud

In the next step, you can specify the location type (multi-region by default) and the location place.

Choose Storage Class - Google Cloud

Then, you can change the storage class from standard (default option) to nearline or coldline.

Access - Google Cloud

And then, you can change the control access.

Advanced Setting - Google Cloud

Finally, you have some optional settings like encryption or retention policy.

Now you have your new bucket created, we will see how to use it.

Using the GSutil Tool

GSutil is a Python application that lets you access Cloud Storage from the command line. It allows you to perform different bucket and object management tasks. Let’s see how to install it on CentOS 7 and how to upload a backup using it.

Download Cloud SDK:

$ curl https://sdk.cloud.google.com | bash

Restart your shell:

$ exec -l $SHELL

Run gcloud init and configure the tool:

$ gcloud init

This command will ask you to login to your Google Cloud account by accessing a URL and adding an authentication code.

Now you have the tool installed and configured, let’s upload a backup to the bucket.

First, let’s check our buckets created:

[[email protected] ~]# gsutil ls

gs://pgbackups1/

And to copy your PostgreSQL backup (or another file), run:

[[email protected] ~]# gsutil cp /root/backups/BACKUP-3/base.tar.gz gs://pgbackups1/new_backup/

Copying file:///root/backups/BACKUP-3/base.tar.gz [Content-Type=application/x-tar]...

| [1 files][  4.9 MiB/ 4.9 MiB]

Operation completed over 1 objects/4.9 MiB.

The destination bucket must exist. 

And then, you can list the contents of the new_backup directory, to check the file uploaded:

[[email protected] ~]# gsutil ls -r gs://pgbackups1/new_backup/*

gs://pgbackups1/new_backup/

gs://pgbackups1/new_backup/base.tar.gz

For more information about the GSutil usage, you can check the official documentation.

Google Cloud SQL

If you want to centralize all the environment (database + backups) into Google Cloud, you have available this Cloud SQL product. In this way, you will have your PostgreSQL database running on Google Cloud and you can also manage the backups from the same platform. It’s accessible from the Google Cloud Platform, in the Getting started section or under the Storage left menu.

How to Use Google Cloud SQL

To create a new PostgreSQL instance, go to Google Cloud Platform -> SQL -> Create Instance

Google Cloud SQL - Create Instance

Here you can choose between MySQL and PostgreSQL as the database engine. For this blog, let’s create a PostgreSQL instance.

Google Cloud SQL - Instance Info

Now, you need to add an instance ID, password, location and PostgreSQL version (9.6 or 11).

Google Cloud SQL - Configuration Options

You have also some configuration options, like enable Public IP Address, Machine type and storage, and backups, etc. 

When the Cloud SQL instance is created, you can select it and you will see an overview of this new instance.

PostgreSQL on Google Cloud SQL

And you can go to the Backups section to manage your PostgreSQL backups. 

Google Cloud SQL Backups

To reduce storage costs, backups work incrementally. Each backup stores only the changes to your data since the previous backup.

Google Cloud Compute Engine

Similar to Amazon EC2, this way to store information in the cloud is more expensive and time-consuming than Cloud Storage, but you will have full control over the backup storage environment.  It’s also accessible from the Google Cloud Platform, in the Getting started section or under the Compute left menu.

How to Use a Google Cloud Compute Engine

To create a new virtual machine, go to Google Cloud Platform -> Compute Engine -> Create Instance

Google Cloud - Create Compute Instance

Here you need to add an instance name, region, and zone where to create it. Also, you need to specify the machine configuration according to your hardware and usage requirements, and the disk size and operating system to use for the new virtual machine. 

Google Cloud Compute Engine

When the instance is ready, you can store the backups here, for example, sending it via SSH or FTP using the external IP Address. Let’s look at an example with Rsync and another one with SCP Linux command.

To connect via SSH to the new virtual machine, make sure you have added your SSH key in the virtual machine configuration.

[[email protected] ~]# rsync -avzP -e "ssh -i /home/sinsausti/.ssh/id_rsa" /root/backups/BACKUP-3/base.tar.gz [email protected]:/home/sinsausti/pgbackups/

sending incremental file list

base.tar.gz

      5,155,420 100%    1.86MB/s 0:00:02 (xfr#1, to-chk=0/1)



sent 4,719,597 bytes  received 35 bytes 629,284.27 bytes/sec

total size is 5,155,420  speedup is 1.09

[[email protected] ~]#

[[email protected] ~]# scp -i /home/sinsausti/.ssh/id_rsa /root/backups/BACKUP-5/base.tar.gz [email protected]:/home/sinsausti/pgbackups/

base.tar.gz                                                                                                                                                             100% 2905KB 968.2KB/s 00:03

[[email protected] ~]#

You can easily embed this into a script to perform an automatic backup process or use this product with an external system like ClusterControl to manage your backups.

Managing Your Backups with ClusterControl

In the same way that you can centralize the management for both database and backup from the same platform by using Cloud SQL, you can use ClusterControl for several management tasks related to your PostgreSQL database.

ClusterControl is a comprehensive management system for open source databases that automates deployment and management functions, as well as health and performance monitoring. ClusterControl supports deployment, management, monitoring and scaling for different database technologies and environments. So, you can, for example, create our Virtual Machine instance on Google Cloud, and deploy/import our database service with ClusterControl.

ClusterControl PostgreSQL

Creating a Backup

For this task, go to ClusterControl -> Select Cluster -> Backup -> Create Backup.

ClusterControl - Create Backup

You can create a new backup or configure a scheduled one. For our example, we will create a single backup instantly.

ClusterControl - Choose Backup Method

You must choose one method, the server from which the backup will be taken, and where you want to store the backup. You can also upload our backup to the cloud (AWS, Google or Azure) by enabling the corresponding button.

ClusterControl - Backup Configuration

Then specify the use of compression, the compression level, encryption and retention period for your backup.

ClusterControl - Cloud Credentials for Backup

If you enabled the upload backup to the cloud option, you will see a section to specify the cloud provider (in this case Google Cloud) and the credentials (ClusterControl -> Integrations -> Cloud Providers). For Google Cloud, it uses Cloud Storage, so you must select a Bucket or even create a new one to store your backups.

ClusterControl Backup Management

On the backup section, you can see the progress of the backup, and information like method, size, location, and more.

Conclusion

Google Cloud may be a good option to store your PostgreSQL backups and it offers different products to make this. It’s not, however, necessary to have your PostgreSQL databases running there as you can use it only as a storage location. 

The GSutil tool is a nice product for managing your Cloud Storage data from the command line, easy-to-use and fast. 

You can also combine Google Cloud and ClusterControl to improve your PostgreSQL high availability environment and monitoring system. If you want to know more about PostgreSQL on Google Cloud you can check our deep dive blog post.

ClusterControl
The only management system you’ll ever need to take control of your open source database infrastructure.