blog
Cloud Vendor Deep-Dive: PostgreSQL on Microsoft Azure
If you have followed Microsoft lately it will come as no surprise that the provider of a competing database product, namely SQL Server, also jumped on the PostgreSQL bandwagon. From releasing 60,000 patents to OIN to being Platinum sponsor at PGCon, Microsoft as one of the PostgreSQL corporate backing organizations. Took every opportunity for showing that not only can you run PostgreSQL on Microsoft, but also the reverse is true: Microsoft, through its cloud offering, can run PostgreSQL for you. The statement became even more clear with the acquisition of Citus Data and the release of their flagship product in the Azure Cloud under the name of Hyperscale. It is safe to say that PostgreSQL adoption is growing and now there are even more good reasons to choose it.
My journey through the Azure cloud started right at the landing page where I meet the contenders: Single Server and a preview (in other words no SLA provided) release of Hyperscale (Citus). This blog will focus on the former. While on this journey, I had the opportunity to practice what open source is all about — giving back to the community — in this case, by providing feedback to the documentation that, to Microsoft’s credit, they make this very easy by piping the feedback straight into Github:
PostgreSQL Compatibility with Azure
Versioning
According to product documentation Single Server targets PostgreSQL versions in the n-2 major range:
As a solution built for performance Single Server is recommended for data sets 100 GB and larger. The servers provided predictable performance — the database instances come with a predefined number of vCores and IOPS (based on the size of provisioned storage).
Extensions
There is a fair number of Supported Extensions with some of them being installed out of the box:
postgres@pg10:5432 postgres> select name, default_version, installed_version from pg_available_extensions where name !~ '^postgis' order by name;
name | default_version | installed_version
------------------------------+-----------------+-------------------
address_standardizer | 2.4.3 |
address_standardizer_data_us | 2.4.3 |
btree_gin | 1.2 |
btree_gist | 1.5 |
chkpass | 1.0 |
citext | 1.4 |
cube | 1.2 |
dblink | 1.2 |
dict_int | 1.0 |
earthdistance | 1.1 |
fuzzystrmatch | 1.1 |
hstore | 1.4 |
hypopg | 1.1.1 |
intarray | 1.2 |
isn | 1.1 |
ltree | 1.1 |
orafce | 3.7 |
pg_buffercache | 1.3 | 1.3
pg_partman | 2.6.3 |
pg_prewarm | 1.1 |
pg_qs | 1.1 |
pg_stat_statements | 1.6 | 1.6
pg_trgm | 1.3 |
pg_wait_sampling | 1.1 |
pgcrypto | 1.3 |
pgrouting | 2.5.2 |
pgrowlocks | 1.2 |
pgstattuple | 1.5 |
plpgsql | 1.0 | 1.0
plv8 | 2.1.0 |
postgres_fdw | 1.0 |
tablefunc | 1.0 |
timescaledb | 1.1.1 |
unaccent | 1.1 |
uuid-ossp | 1.1 |
(35 rows)
PostgreSQL Monitoring on Azure
Server monitoring relies on a set of metrics that can be neatly grouped to create a custom dashboard:
Those familiar with Graphviz or Blockdiag are likely to appreciate the option of exporting the entire dashboard to a JSON file:
Furthermore metrics can — and they should — be linked to alerts:
Query statistics can be tracked by means of Query Store and visualized with Query Performance Insight. For that, a couple of Azure specific parameters will need to be enabled:
postgres@pg10:5432 postgres> select * from pg_settings where name ~ 'pgms_wait_sampling.query_capture_mode|pg_qs.query_capture_mode';
-[ RECORD 1 ]---+------------------------------------------------------------------------------------------------------------------
name | pg_qs.query_capture_mode
setting | top
unit |
category | Customized Options
short_desc | Selects which statements are tracked by pg_qs. Need to reload the config to make change take effect.
extra_desc |
context | superuser
vartype | enum
source | configuration file
min_val |
max_val |
enumvals | {none,top,all}
boot_val | none
reset_val | top
sourcefile |
sourceline |
pending_restart | f
-[ RECORD 2 ]---+------------------------------------------------------------------------------------------------------------------
name | pgms_wait_sampling.query_capture_mode
setting | all
unit |
category | Customized Options
short_desc | Selects types of wait events are tracked by this extension. Need to reload the config to make change take effect.
extra_desc |
context | superuser
vartype | enum
source | configuration file
min_val |
max_val |
enumvals | {none,all}
boot_val | none
reset_val | all
sourcefile |
sourceline |
pending_restart | f
In order to visualize the slow queries and waits we proceed to the Query Performance widget:
Long Running Queries
Wait Statistics
PostgreSQL Logging on Azure
The standard PostgreSQL logs can be downloaded, or exported to Log Analytics for more advanced parsing:
PostgreSQL Performance and Scaling with Azure
While the number of vCores can be easily increased or decreased, this action will trigger a server restart:
In order to achieve zero downtime applications must be able to gracefully handle transient errors.
For tuning queries, Azure provides the DBA with Performance Recommendations, in addition to the preloaded pg_statements and pg_buffercache extensions:
High Availability and Replication on Azure
Database server high availability is achieved by means of a node based hardware replication. This ensures that in the case of hardware failure, a new node can be brought up within tens of seconds.
Azure provides a redundant gateway as a network connection endpoint for all database servers within a region.
PostgreSQL Security on Azure
By default firewall rules deny access to the PostgreSQL instance. Since an Azure database server is the equivalent of a database cluster the access rules will apply to all databases hosted on the server.
In addition to IP addresses, firewall rules can reference virtual network, a feature available only for General Purpose and Memory Optimized tiers.
One thing I found peculiar in the firewall web interface — I could not navigate away from the page while changes were being saved:
Data at rest is encrypted using a Server-Managed Key and cloud users cannot disable the encryption. Data in transit is also encrypted — SSL required can only be changed after the database server is created. Just as the data at rest, backups are encrypted and encryption cannot be disabled.
Advanced Threat Protection provides alerts and recommendations on a number of database access requests that are considered a security risk. The feature is currently in preview. To demonstrate, I simulated a password brute force attack:
~ $ while : ; do psql -U $(pwgen -s 20 1)@pg10 ; sleep 0.1 ; done
psql: FATAL: password authentication failed for user "AApT6z4xUzpynJwiNAYf"
psql: FATAL: password authentication failed for user "gaNeW8VSIflkdnNZSpNV"
psql: FATAL: password authentication failed for user "SWZnY7wGTxdLTLcbqnUW"
psql: FATAL: password authentication failed for user "BVH2SC12m9js9vZHcuBd"
psql: FATAL: password authentication failed for user "um9kqUxPIxeQrzWQXr2v"
psql: FATAL: password authentication failed for user "8BGXyg3KHF3Eq3yHpik1"
psql: FATAL: password authentication failed for user "5LsVrtBjcewd77Q4kaj1"
....
Check the PostgreSQL logs:
2019-08-19 07:13:50 UTC-5d5a4c2e.138-FATAL: password authentication failed
for user "AApT6z4xUzpynJwiNAYf"
2019-08-19 07:13:50 UTC-5d5a4c2e.138-DETAIL: Role "AApT6z4xUzpynJwiNAYf" does not exist.
Connection matched pg_hba.conf line 3: "host all all 173.180.222.170/32 password"
2019-08-19 07:13:51 UTC-5d5a4c2f.13c-LOG: connection received: host=173.180.222.170 port=27248 pid=316
2019-08-19 07:13:51 UTC-5d5a4c2f.13c-FATAL: password authentication failed for user "gaNeW8VSIflkdnNZSpNV"
2019-08-19 07:13:51 UTC-5d5a4c2f.13c-DETAIL: Role "gaNeW8VSIflkdnNZSpNV" does not exist.
Connection matched pg_hba.conf line 3: "host all all 173.180.222.170/32 password"
2019-08-19 07:13:52 UTC-5d5a4c30.140-LOG: connection received: host=173.180.222.170 port=58256 pid=320
2019-08-19 07:13:52 UTC-5d5a4c30.140-FATAL: password authentication failed for user "SWZnY7wGTxdLTLcbqnUW"
2019-08-19 07:13:52 UTC-5d5a4c30.140-DETAIL: Role "SWZnY7wGTxdLTLcbqnUW" does not exist.
Connection matched pg_hba.conf line 3: "host all all 173.180.222.170/32 password"
2019-08-19 07:13:53 UTC-5d5a4c31.148-LOG: connection received: host=173.180.222.170 port=32984 pid=328
2019-08-19 07:13:53 UTC-5d5a4c31.148-FATAL: password authentication failed for user "BVH2SC12m9js9vZHcuBd"
2019-08-19 07:13:53 UTC-5d5a4c31.148-DETAIL: Role "BVH2SC12m9js9vZHcuBd" does not exist.
Connection matched pg_hba.conf line 3: "host all all 173.180.222.170/32 password"
2019-08-19 07:13:53 UTC-5d5a4c31.14c-LOG: connection received: host=173.180.222.170 port=43384 pid=332
2019-08-19 07:13:54 UTC-5d5a4c31.14c-FATAL: password authentication failed for user "um9kqUxPIxeQrzWQXr2v"
2019-08-19 07:13:54 UTC-5d5a4c31.14c-DETAIL: Role "um9kqUxPIxeQrzWQXr2v" does not exist.
Connection matched pg_hba.conf line 3: "host all all 173.180.222.170/32 password"
2019-08-19 07:13:54 UTC-5d5a4c32.150-LOG: connection received: host=173.180.222.170 port=27672 pid=336
2019-08-19 07:13:54 UTC-5d5a4c32.150-FATAL: password authentication failed for user "8BGXyg3KHF3Eq3yHpik1"
2019-08-19 07:13:54 UTC-5d5a4c32.150-DETAIL: Role "8BGXyg3KHF3Eq3yHpik1" does not exist.
Connection matched pg_hba.conf line 3: "host all all 173.180.222.170/32 password"
2019-08-19 07:13:55 UTC-5d5a4c33.154-LOG: connection received: host=173.180.222.170 port=12712 pid=340
2019-08-19 07:13:55 UTC-5d5a4c33.154-FATAL: password authentication failed for user "5LsVrtBjcewd77Q4kaj1"
2019-08-19 07:13:55 UTC-5d5a4c33.154-DETAIL: Role "5LsVrtBjcewd77Q4kaj1" does not exist.
The email alert arrived about 30 minutes later:
In order to allow fine grained access to database server, Azure provides RBAC, which is a cloud native access control feature, just one more tool in the arsenal of the PostgreSQL Cloud DBA. This is as close as we can get to the ubiquitous pg_hba access rules.
PostgreSQL Backup and Recovery on Azure
Regardless of pricing tiers, backups are retained between 7 and 35 days. The pricing tier also influences the ability to restore data.
Point-in-time recovery is available via the Azure Portal or the CLI and according to documentation as granular as up to five minutes. The portal functionality is rather limited — the date picker widget blindly shows the last 7 days as possible dates to select, although I created the server today. Also, there is no verification performed on the recovery target time — I expected that entering a value outside the recovery interval would trigger an error preventing the wizard to continue:
Once the restore process is started, an error, supposedly caused by the out of range value, will popup about a minute later:
…but, unfortunately, the error message was not very helpful:
Lastly, backup storage is free for retention periods of up to 7 days. That could prove extremely handy for development environments.
Hints and Tips
Limits
Get accustomed with the Single Server Limits.
Connectivity
Always use the connection string in order for the connection to be routed to the correct database server.
Replication
For disaster recovery scenarios, locate read replicas in one of the paired regions.
Roles
Just as is the case with AWS and GCloud, there is no superuser access.
GUCs
Parameters requiring a server restart or superuser access cannot be configured.
Scaling
During auto-scaling, applications should retry until the new node is brought up.
Memory amount and IOPS cannot be specified — memory is allocated in units of GB per vCore, up to a maximum of 320GB (32vCores x 10GB), and IOPS are dependent on the size of the provisioned storage to a maximum of 6000 IOPS. At this time Azure offers a large storage preview option with a maximum of 20,000 IOPS.
Servers created in the Basic tier cannot be upgraded to General Purpose or Memory Optimized.
Storage
Ensure that the auto-grow feature is enabled — if the amount of data exceed the provisioned storage space, the database will enter in read-only mode.
Storage can only be scaled up. Just as with all the other cloud providers storage allocation cannot be decreased and I couldn’t come across any explanation. Given the state of the art equipment, the big cloud players can afford there should be no reason for not providing features similar to LVM online data relocation. Storage is really cheap nowadays, there is really no reason to think about scaling down until the next major version upgrade.
Firewall
In some cases, updates to firewall rules may take up to five minutes to propagate.
A server is located in the same subnet as the application servers will not be reachable until the appropriate firewall rules are in place.
Virtual network rules do not allow cross-region access and as a result, dblink and postgres_fdw cannot be used to connect to databases outside the Azure cloud.
The VNet/Subnet approach cannot be applied to Web Apps as their connections originate from public IP addresses.
Large virtual networks will be unavailable while the service endpoints are enabled.
Encryption
For applications that require server certificate validation, the file is available for download from Digicert. Microsoft made it easy and you shouldn’t have to worry about renewal until 2025:
~ $ openssl x509 -in BaltimoreCyberTrustRoot.crt.pem -noout -dates
notBefore=May 12 18:46:00 2000 GMT
notAfter=May 12 23:59:00 2025 GMT
Intrusion Detection System
The preview release of Advanced Threat Protection is not available for the Basic tier instances.
Backup and Restore
For applications that cannot afford a region downtime, consider configuring the server with geo-redundant backup storage. This option can only be enabled at the time of creating the database server.
The requirement for reconfiguring the cloud firewall rules after a PITR operation is particularly important.
Deleting a database server removes all backups.
Following the restore, there are certain post-restore tasks that will have to be performed.
Unlogged tables are recommended for bulk inserts in order to boost performance, however, they are not replicated.
Monitoring
Metrics are recorded every minute and stored for 30 days.
Logging
Query Store is a global option, meaning that it applies to all databases. Read-only transactions and queries longer than 6,000 bytes are problematic. By default, the captured queries are retained for 7 days.
Performance
Query Performance Insight recommendations are currently limited to create and drop index.
Disable pg_stat_staements when not needed.
Replace uuid_generate_v4 with gen_random_uuid(). This is inline with the recommendation in the official PostgreSQL documentation, see Building uuid-ossp.
High Availability and Replication
There is a limit of five read replicas. Write-intensive applications should avoid using read replicas as the replication mechanism is asynchronous which introduces some delays that applications must be able to tolerate. Read replicas can be located in a different region.
REPLICA support can only be enabled after the server was created. The feature requires a server restart:
Read replicas do not inherit the firewall rules from master node:
Failover to read replica is not automatic. The failover mechanism is node based.
There is a long list of Considerations that needs to be reviewed before configuring read replicas.
Creating replicas takes a long time, even when I tested with relatively small data set:
Vacuum
Review the key parameters, as Azure Database for PostgreSQL ships with upstream vacuum default values:
postgres@pg10:5432 postgres> select name,setting from pg_settings where name ~ '^autovacuum.*';
name | setting
-------------------------------------+-----------
autovacuum | on
autovacuum_analyze_scale_factor | 0.05
autovacuum_analyze_threshold | 50
autovacuum_freeze_max_age | 200000000
autovacuum_max_workers | 3
autovacuum_multixact_freeze_max_age | 400000000
autovacuum_naptime | 15
autovacuum_vacuum_cost_delay | 20
autovacuum_vacuum_cost_limit | -1
autovacuum_vacuum_scale_factor | 0.05
autovacuum_vacuum_threshold | 50
autovacuum_work_mem | -1
(12 rows)
Upgrades
Automatic major upgrades are not supported. As mentioned earlier, this is a cost savings opportunity, by scaling down the auto-grown storage.
PostgreSQL Azure Enhancements
Timeseries
TimescaleDB is available as an extension (not part of the PostgreSQL modules), however, it is just a few clicks away. The only drawback being the older version 1.1.1, while the upstream version is currently at 1.4.1 (2019-08-01).
postgres@pg10:5432 postgres> CREATE EXTENSION IF NOT EXISTS timescaledb CASCADE;
WARNING:
WELCOME TO
_____ _ _ ____________
|_ _(_) | | | _ ___
| | _ _ __ ___ ___ ___ ___ __ _| | ___| | | | |_/ /
| | | | _ ` _ / _ / __|/ __/ _` | |/ _ | | | ___
| | | | | | | | | __/__ (_| (_| | | __/ |/ /| |_/ /
|_| |_|_| |_| |_|___||___/_____,_|_|___|___/ ____/
Running version 1.1.1
For more information on TimescaleDB, please visit the following links:
1. Getting started: https://docs.timescale.com/getting-started
2. API reference documentation: https://docs.timescale.com/api
3. How TimescaleDB is designed: https://docs.timescale.com/introduction/architecture
CREATE EXTENSION
postgres@pg10:5432 postgres> dx timescaledb
List of installed extensions
Name | Version | Schema | Description
-------------+---------+--------+-------------------------------------------------------------------
timescaledb | 1.1.1 | public | Enables scalable inserts and complex queries for time-series data
(1 row)
Logging
In addition to PostgreSQL logging options, Azure Database for PostgreSQL can be configured to record additional diagnostics events.
Firewall
Azure Portal includes a handy feature for allowing connections from the IP addresses logged in to the portal:
I noted the feature as it makes it easy for developers and system administrators to allow themselves in, and it stands out as a feature not offered by neither AWS, nor GCloud.
Conclusion
Azure Database for PostgreSQL Single Server offers enterprise level services, however, many of these services are still in preview mode: Query Store, Performance Insight, Performance Recommendation, Advanced Threat Protection, Large Storage, Cross-region Read Replicas.
While operating system knowledge is no longer required for administering PostgreSQL in the Azure cloud, the DBA is expected to acquire skills which are not limited to the database itself — Azure networking (VNet), connection security (firewall), log viewer and analytics along with KQL, Azure CLI for handy scripting, and the list goes on.
Lastly, for those planning to migrate their PostgreSQL workloads to Azure, there are a number of resources available along with a select list of Azure Partners including Credativ, one of the PostgreSQL major sponsors and contributors.