blog

Cloud Vendor Deep-Dive: PostgreSQL on Microsoft Azure

Viorel Tabara

Published:

If you have followed Microsoft lately it will come as no surprise that the provider of a competing database product, namely SQL Server, also jumped on the PostgreSQL bandwagon. From releasing 60,000 patents to OIN to being Platinum sponsor at PGCon, Microsoft as one of the PostgreSQL corporate backing organizations. Took every opportunity for showing that not only can you run PostgreSQL on Microsoft, but also the reverse is true: Microsoft, through its cloud offering, can run PostgreSQL for you. The statement became even more clear with the acquisition of Citus Data and the release of their flagship product in the Azure Cloud under the name of Hyperscale. It is safe to say that PostgreSQL adoption is growing and now there are even more good reasons to choose it.

My journey through the Azure cloud started right at the landing page where I meet the contenders: Single Server and a preview (in other words no SLA provided) release of Hyperscale (Citus). This blog will focus on the former. While on this journey, I had the opportunity to practice what open source is all about — giving back to the community — in this case, by providing feedback to the documentation that, to Microsoft’s credit, they make this very easy by piping the feedback straight into Github:

 My Azure Documentation Feedback Issues

PostgreSQL Compatibility with Azure

Versioning

According to product documentation Single Server targets PostgreSQL versions in the n-2 major range:

 Single server PostgreSQL versions

As a solution built for performance Single Server is recommended for data sets 100 GB and larger. The servers provided predictable performance — the database instances come with a predefined number of vCores and IOPS (based on the size of provisioned storage).

Extensions

There is a fair number of Supported Extensions with some of them being installed out of the box:

postgres@pg10:5432 postgres> select name, default_version, installed_version from pg_available_extensions where name !~ '^postgis' order by name;

            name             | default_version | installed_version

------------------------------+-----------------+-------------------

address_standardizer         | 2.4.3 |

address_standardizer_data_us | 2.4.3           |

btree_gin                    | 1.2 |

btree_gist                   | 1.5 |

chkpass                      | 1.0 |

citext                       | 1.4 |

cube                         | 1.2 |

dblink                       | 1.2 |

dict_int                     | 1.0 |

earthdistance                | 1.1 |

fuzzystrmatch                | 1.1 |

hstore                       | 1.4 |

hypopg                       | 1.1.1 |

intarray                     | 1.2 |

isn                          | 1.1 |

ltree                        | 1.1 |

orafce                       | 3.7 |

pg_buffercache               | 1.3 | 1.3

pg_partman                   | 2.6.3 |

pg_prewarm                   | 1.1 |

pg_qs                        | 1.1 |

pg_stat_statements           | 1.6 | 1.6

pg_trgm                      | 1.3 |

pg_wait_sampling             | 1.1 |

pgcrypto                     | 1.3 |

pgrouting                    | 2.5.2 |

pgrowlocks                   | 1.2 |

pgstattuple                  | 1.5 |

plpgsql                      | 1.0 | 1.0

plv8                         | 2.1.0 |

postgres_fdw                 | 1.0 |

tablefunc                    | 1.0 |

timescaledb                  | 1.1.1 |

unaccent                     | 1.1 |

uuid-ossp                    | 1.1 |

(35 rows)

PostgreSQL Monitoring on Azure

Server monitoring relies on a set of metrics that can be neatly grouped to create a custom dashboard:

 Single server --- Metrics

Those familiar with Graphviz or Blockdiag are likely to appreciate the option of exporting the entire dashboard to a JSON file:

 Single server --- Metrics

Furthermore metrics can — and they should — be linked to alerts:

 Single Server --- Available Alerts

Query statistics can be tracked by means of Query Store and visualized with Query Performance Insight. For that, a couple of Azure specific parameters will need to be enabled:

postgres@pg10:5432 postgres> select * from pg_settings where name ~ 'pgms_wait_sampling.query_capture_mode|pg_qs.query_capture_mode';

-[ RECORD 1 ]---+------------------------------------------------------------------------------------------------------------------

name            | pg_qs.query_capture_mode

setting         | top

unit            |

category        | Customized Options

short_desc      | Selects which statements are tracked by pg_qs. Need to reload the config to make change take effect.

extra_desc      |

context         | superuser

vartype         | enum

source          | configuration file

min_val         |

max_val         |

enumvals        | {none,top,all}

boot_val        | none

reset_val       | top

sourcefile      |

sourceline      |

pending_restart | f

-[ RECORD 2 ]---+------------------------------------------------------------------------------------------------------------------

name            | pgms_wait_sampling.query_capture_mode

setting         | all

unit            |

category        | Customized Options

short_desc      | Selects types of wait events are tracked by this extension. Need to reload the config to make change take effect.

extra_desc      |

context         | superuser

vartype         | enum

source          | configuration file

min_val         |

max_val         |

enumvals        | {none,all}

boot_val        | none

reset_val       | all

sourcefile      |

sourceline      |

pending_restart | f

In order to visualize the slow queries and waits we proceed to the Query Performance widget:

Long Running Queries​​​

 Single server --- Long running queries graph

Wait Statistics

 Single server --- wait statistics

PostgreSQL Logging on Azure

The standard PostgreSQL logs can be downloaded, or exported to Log Analytics for more advanced parsing:

 Single server --- Log Analytics

PostgreSQL Performance and Scaling with Azure

While the number of vCores can be easily increased or decreased, this action will trigger a server restart:

 Single server PostgreSQL versions

In order to achieve zero downtime applications must be able to gracefully handle transient errors.

For tuning queries, Azure provides the DBA with Performance Recommendations, in addition to the preloaded pg_statements and pg_buffercache extensions:

 Single server --- Performance Recommendations screen

High Availability and Replication on Azure

Database server high availability is achieved by means of a node based hardware replication. This ensures that in the case of hardware failure, a new node can be brought up within tens of seconds.

Azure provides a redundant gateway as a network connection endpoint for all database servers within a region.

PostgreSQL Security on Azure

By default firewall rules deny access to the PostgreSQL instance. Since an Azure database server is the equivalent of a database cluster the access rules will apply to all databases hosted on the server.

In addition to IP addresses, firewall rules can reference virtual network, a feature available only for General Purpose and Memory Optimized tiers.

 Single server --- Firewall --- Adding a VNet

One thing I found peculiar in the firewall web interface — I could not navigate away from the page while changes were being saved:

 Single server --- change security rules in progress pop-up screen when attempting to navigate away

Data at rest is encrypted using a Server-Managed Key and cloud users cannot disable the encryption. Data in transit is also encrypted — SSL required can only be changed after the database server is created. Just as the data at rest, backups are encrypted and encryption cannot be disabled.

Advanced Threat Protection provides alerts and recommendations on a number of database access requests that are considered a security risk. The feature is currently in preview. To demonstrate, I simulated a password brute force attack:

~ $ while : ; do psql -U $(pwgen -s 20 1)@pg10 ; sleep 0.1 ; done

psql: FATAL:  password authentication failed for user "AApT6z4xUzpynJwiNAYf"

psql: FATAL:  password authentication failed for user "gaNeW8VSIflkdnNZSpNV"

psql: FATAL:  password authentication failed for user "SWZnY7wGTxdLTLcbqnUW"

psql: FATAL:  password authentication failed for user "BVH2SC12m9js9vZHcuBd"

psql: FATAL:  password authentication failed for user "um9kqUxPIxeQrzWQXr2v"

psql: FATAL:  password authentication failed for user "8BGXyg3KHF3Eq3yHpik1"

psql: FATAL:  password authentication failed for user "5LsVrtBjcewd77Q4kaj1"

....

Check the PostgreSQL logs:

2019-08-19 07:13:50 UTC-5d5a4c2e.138-FATAL:  password authentication failed

for user "AApT6z4xUzpynJwiNAYf"

2019-08-19 07:13:50 UTC-5d5a4c2e.138-DETAIL:  Role "AApT6z4xUzpynJwiNAYf" does not exist.

   Connection matched pg_hba.conf line 3: "host all all 173.180.222.170/32 password"

2019-08-19 07:13:51 UTC-5d5a4c2f.13c-LOG:  connection received: host=173.180.222.170 port=27248 pid=316

2019-08-19 07:13:51 UTC-5d5a4c2f.13c-FATAL:  password authentication failed for user "gaNeW8VSIflkdnNZSpNV"

2019-08-19 07:13:51 UTC-5d5a4c2f.13c-DETAIL:  Role "gaNeW8VSIflkdnNZSpNV" does not exist.

   Connection matched pg_hba.conf line 3: "host all all 173.180.222.170/32 password"

2019-08-19 07:13:52 UTC-5d5a4c30.140-LOG:  connection received: host=173.180.222.170 port=58256 pid=320

2019-08-19 07:13:52 UTC-5d5a4c30.140-FATAL:  password authentication failed for user "SWZnY7wGTxdLTLcbqnUW"

2019-08-19 07:13:52 UTC-5d5a4c30.140-DETAIL:  Role "SWZnY7wGTxdLTLcbqnUW" does not exist.

   Connection matched pg_hba.conf line 3: "host all all 173.180.222.170/32 password"

2019-08-19 07:13:53 UTC-5d5a4c31.148-LOG:  connection received: host=173.180.222.170 port=32984 pid=328

2019-08-19 07:13:53 UTC-5d5a4c31.148-FATAL:  password authentication failed for user "BVH2SC12m9js9vZHcuBd"

2019-08-19 07:13:53 UTC-5d5a4c31.148-DETAIL:  Role "BVH2SC12m9js9vZHcuBd" does not exist.

   Connection matched pg_hba.conf line 3: "host all all 173.180.222.170/32 password"

2019-08-19 07:13:53 UTC-5d5a4c31.14c-LOG:  connection received: host=173.180.222.170 port=43384 pid=332

2019-08-19 07:13:54 UTC-5d5a4c31.14c-FATAL:  password authentication failed for user "um9kqUxPIxeQrzWQXr2v"

2019-08-19 07:13:54 UTC-5d5a4c31.14c-DETAIL:  Role "um9kqUxPIxeQrzWQXr2v" does not exist.

   Connection matched pg_hba.conf line 3: "host all all 173.180.222.170/32 password"

2019-08-19 07:13:54 UTC-5d5a4c32.150-LOG:  connection received: host=173.180.222.170 port=27672 pid=336

2019-08-19 07:13:54 UTC-5d5a4c32.150-FATAL:  password authentication failed for user "8BGXyg3KHF3Eq3yHpik1"

2019-08-19 07:13:54 UTC-5d5a4c32.150-DETAIL:  Role "8BGXyg3KHF3Eq3yHpik1" does not exist.

   Connection matched pg_hba.conf line 3: "host all all 173.180.222.170/32 password"

2019-08-19 07:13:55 UTC-5d5a4c33.154-LOG:  connection received: host=173.180.222.170 port=12712 pid=340

2019-08-19 07:13:55 UTC-5d5a4c33.154-FATAL:  password authentication failed for user "5LsVrtBjcewd77Q4kaj1"

2019-08-19 07:13:55 UTC-5d5a4c33.154-DETAIL:  Role "5LsVrtBjcewd77Q4kaj1" does not exist.

The email alert arrived about 30 minutes later:

 Single server --- Advanced Threat Protection email alert

In order to allow fine grained access to database server, Azure provides RBAC, which is a cloud native access control feature, just one more tool in the arsenal of the PostgreSQL Cloud DBA. This is as close as we can get to the ubiquitous pg_hba access rules.

PostgreSQL Backup and Recovery on Azure

Regardless of pricing tiers, backups are retained between 7 and 35 days. The pricing tier also influences the ability to restore data.

Point-in-time recovery is available via the Azure Portal or the CLI and according to documentation as granular as up to five minutes. The portal functionality is rather limited — the date picker widget blindly shows the last 7 days as possible dates to select, although I created the server today. Also, there is no verification performed on the recovery target time — I expected that entering a value outside the recovery interval would trigger an error preventing the wizard to continue:

 Single server --- point-in-time restore screen

Once the restore process is started, an error, supposedly caused by the out of range value, will popup about a minute later:

 Single server --- Activity Log error message on restore failure

…but, unfortunately, the error message was not very helpful:

 Single server --- Activity Log error details on restore failure

Lastly, backup storage is free for retention periods of up to 7 days. That could prove extremely handy for development environments.

Hints and Tips

Limits

Get accustomed with the Single Server Limits.

Connectivity

Always use the connection string in order for the connection to be routed to the correct database server.

Replication

For disaster recovery scenarios, locate read replicas in one of the paired regions.

Roles

Just as is the case with AWS and GCloud, there is no superuser access.

GUCs

Parameters requiring a server restart or superuser access cannot be configured.

Scaling

During auto-scaling, applications should retry until the new node is brought up.

Memory amount and IOPS cannot be specified — memory is allocated in units of GB per vCore, up to a maximum of 320GB (32vCores x 10GB), and IOPS are dependent on the size of the provisioned storage to a maximum of 6000 IOPS. At this time Azure offers a large storage preview option with a maximum of 20,000 IOPS.

Servers created in the Basic tier cannot be upgraded to General Purpose or Memory Optimized.

Storage

Ensure that the auto-grow feature is enabled — if the amount of data exceed the provisioned storage space, the database will enter in read-only mode.

Storage can only be scaled up. Just as with all the other cloud providers storage allocation cannot be decreased and I couldn’t come across any explanation. Given the state of the art equipment, the big cloud players can afford there should be no reason for not providing features similar to LVM online data relocation. Storage is really cheap nowadays, there is really no reason to think about scaling down until the next major version upgrade.

Firewall

In some cases, updates to firewall rules may take up to five minutes to propagate.

A server is located in the same subnet as the application servers will not be reachable until the appropriate firewall rules are in place.

Virtual network rules do not allow cross-region access and as a result, dblink and postgres_fdw cannot be used to connect to databases outside the Azure cloud.

The VNet/Subnet approach cannot be applied to Web Apps as their connections originate from public IP addresses.

Large virtual networks will be unavailable while the service endpoints are enabled.

Encryption

For applications that require server certificate validation, the file is available for download from Digicert. Microsoft made it easy and you shouldn’t have to worry about renewal until 2025:

~ $ openssl x509 -in BaltimoreCyberTrustRoot.crt.pem -noout -dates

notBefore=May 12 18:46:00 2000 GMT

notAfter=May 12 23:59:00 2025 GMT

Intrusion Detection System

The preview release of Advanced Threat Protection is not available for the Basic tier instances.

Backup and Restore

For applications that cannot afford a region downtime, consider configuring the server with geo-redundant backup storage. This option can only be enabled at the time of creating the database server.

The requirement for reconfiguring the cloud firewall rules after a PITR operation is particularly important.

Deleting a database server removes all backups.

Following the restore, there are certain post-restore tasks that will have to be performed.

Unlogged tables are recommended for bulk inserts in order to boost performance, however, they are not replicated.

Monitoring

Metrics are recorded every minute and stored for 30 days.

Logging

Query Store is a global option, meaning that it applies to all databases. Read-only transactions and queries longer than 6,000 bytes are problematic. By default, the captured queries are retained for 7 days.

Performance

Query Performance Insight recommendations are currently limited to create and drop index.

Disable pg_stat_staements when not needed.

Replace uuid_generate_v4 with gen_random_uuid(). This is inline with the recommendation in the official PostgreSQL documentation, see Building uuid-ossp.

High Availability and Replication

There is a limit of five read replicas. Write-intensive applications should avoid using read replicas as the replication mechanism is asynchronous which introduces some delays that applications must be able to tolerate. Read replicas can be located in a different region.

REPLICA support can only be enabled after the server was created. The feature requires a server restart:

 Single server --- enabling replication
 Single server --- read replica missing firewall rules after creation

Read replicas do not inherit the firewall rules from master node:

 Single server --- read replica missing firewall rules after creation

Failover to read replica is not automatic. The failover mechanism is node based.

There is a long list of Considerations that needs to be reviewed before configuring read replicas.

Creating replicas takes a long time, even when I tested with relatively small data set:

 Single server --- Replicas creation taking a long time
 
Vacuum

Vacuum

Review the key parameters, as Azure Database for PostgreSQL ships with upstream vacuum default values:

postgres@pg10:5432 postgres> select name,setting from pg_settings where name ~ '^autovacuum.*';

               name                 | setting

-------------------------------------+-----------

autovacuum                          | on

autovacuum_analyze_scale_factor     | 0.05

autovacuum_analyze_threshold        | 50

autovacuum_freeze_max_age           | 200000000

autovacuum_max_workers              | 3

autovacuum_multixact_freeze_max_age | 400000000

autovacuum_naptime                  | 15

autovacuum_vacuum_cost_delay        | 20

autovacuum_vacuum_cost_limit        | -1

autovacuum_vacuum_scale_factor      | 0.05

autovacuum_vacuum_threshold         | 50

autovacuum_work_mem                 | -1

(12 rows)

Upgrades

Automatic major upgrades are not supported. As mentioned earlier, this is a cost savings opportunity, by scaling down the auto-grown storage.

PostgreSQL Azure Enhancements

Timeseries

TimescaleDB is available as an extension (not part of the PostgreSQL modules), however, it is just a few clicks away. The only drawback being the older version 1.1.1, while the upstream version is currently at  1.4.1 (2019-08-01).

postgres@pg10:5432 postgres> CREATE EXTENSION IF NOT EXISTS timescaledb CASCADE;

WARNING:

WELCOME TO

_____ _                               _ ____________

|_   _(_)                             | | | _  ___ 

| |  _ _ __ ___   ___ ___ ___ __ _| | ___| | | | |_/ /

| | | |  _ ` _  / _ / __|/ __/ _` | |/ _  | | | ___ 

| | | | | | | | |  __/__  (_| (_| | |  __/ |/ /| |_/ /

|_| |_|_| |_| |_|___||___/_____,_|_|___|___/ ____/

               Running version 1.1.1

For more information on TimescaleDB, please visit the following links:



1. Getting started: https://docs.timescale.com/getting-started

2. API reference documentation: https://docs.timescale.com/api

3. How TimescaleDB is designed: https://docs.timescale.com/introduction/architecture




CREATE EXTENSION



postgres@pg10:5432 postgres> dx timescaledb

                                    List of installed extensions

   Name     | Version | Schema |                            Description

-------------+---------+--------+-------------------------------------------------------------------

timescaledb | 1.1.1   | public | Enables scalable inserts and complex queries for time-series data

(1 row)

Logging

In addition to PostgreSQL logging options, Azure Database for PostgreSQL can be configured to record additional diagnostics events.

Firewall

Azure Portal includes a handy feature for allowing connections from the IP addresses logged in to the portal:

 Single server --- Firewall --- Add Client IP Address

I noted the feature as it makes it easy for developers and system administrators to allow themselves in, and it stands out as a feature not offered by neither AWS, nor GCloud.

Conclusion

Azure Database for PostgreSQL Single Server offers enterprise level services, however, many of these services are still in preview mode: Query Store, Performance Insight, Performance Recommendation, Advanced Threat Protection, Large Storage, Cross-region Read Replicas.

While operating system knowledge is no longer required for administering PostgreSQL in the Azure cloud, the DBA is expected to acquire skills which are not limited to the database itself — Azure networking (VNet), connection security (firewall), log viewer and analytics along with KQL, Azure CLI for handy scripting, and the list goes on.

Lastly, for those planning to migrate their PostgreSQL workloads to Azure, there are a number of resources available along with a select list of Azure Partners including Credativ, one of the PostgreSQL major sponsors and contributors.

 

Subscribe below to be notified of fresh posts