According to StackOverflow, PostgreSQL is the most popular open-source database, with more and more developers and administrators wanting to work with it. PostgreSQL 14, the latest version released in September 2021, has more than 200 improvements and new features, which broadly fall into three areas: Security, Performance, and Monitoring.
Beginning in version 1.9.2, released earlier this year, ClusterControl now supports the latest version of PostgreSQL 14.
In this post, we will look at the most notable features, major updates, and improvements available in PostgreSQL 14.
PostgreSQL 14’s most significant security change is moving to SCRAM as the default authentication after SCRAM-SHA-256 authentication was first introduced in PostgreSQL 10. SCRAM provides a much more simple and powerful solution to regulate compliance for data security. The previous default MD5 authentication has had some exploited weaknesses in the past.
Alongside improved authentication, PostgreSQL 14 simplifies the process of assigning write-only and read-only capabilities to users on tables, views, and schemas by introducing two predefined roles, pg_write_all_data, and pg_read_all_data. The former role makes it convenient to create super-user-styled privileges. Despite being convenient for activities and access, this kind of account has to be used sparingly. The latter role makes it handy in granting read-only access for a user to all tables, schemas, and views in the database. This role will have read access by default to any newly created tables.
Corporations are processing more and more data, and the velocity, volume, & variety (the three Vs of Big Data) show no signs of slowing down. PostgreSQL 14 has made some very significant changes to be able to scale up and out to support these workloads and improve performance. This can be best grouped into the following five areas:
Scalability and Performance
PostgreSQL 14 delivers huge achievements in the performance of deletes and updates on tables with a high number of partitions. This allows deletes and updates on partitioned tables to use execution-time partition pruning. Most importantly, this version can scale to more partitions than before, allowing PostgreSQL to cope with much bigger databases than previously was possible.
In PostgreSQL version 14, partitions can now also be detached concurrently, making it easier to manage large time-series data tables. This is possible due to the new date_bin() function, allowing users to summarize time-series data into date ranges better and more easily.
PostgreSQL 14 added support for LZ4 compression for TOAST. LZ4 uses the latest compression technology and focuses on improving the speed of decompression and compression of document data. LZ4 will enhance applications with larger XML and JSON documents, and this compression algorithm can be configured at both the system and column levels.
Distributed Databases and Workloads
PostgreSQL 14 also offers improvements in horizontally scaling a database by introducing two major enhancements in postgres_fdw (foreign data wrapper) that are instrumental in improving performance for distributed databases. These changes support the bulk insert of data and parallel table scans on foreign tables (when async_capable is set).
Connection caching also improves performance by allowing connections to be open across transactions. We can also mention Libpd pipeline mode, which increases the performance of distributed databases by allowing streams of changes to be transmitted without waiting for network confirmation on each command.
PostgreSQL 14 also addresses the issue of B-tree index bloat; indexes that are frequently updated bloat over time because of dead tuples that can be removed by running a vacuum.
Due to this, an irreversible page split can be caused by an insert or update when the page gets filled up between vacuums. This version has enhanced the vacuum process by detecting and removing dead tuples in-between vacuums (INDEX_CLEANUP parameter set to Auto).
Another improvement on the vacuum system is to remove deleted pages eagerly. In previous versions, removing deleted pages took two vacuum cycles, by first marking the page as deleted and the second cycle actually freeing up that space.
Query and Index Performance Tuning
Index management in application tuning is another major improvement in PostgreSQL 14. Particularly, REINDEX CONCURRENTLY and CREATE INDEX CONCURRENTLY, allow multiple commands to run at once without interfering with each other. More on indexes, BRIN indexes have become more flexible as they can record numerous min/max values per range, which is valuable for groups of values in each page range. Also, BRIN indexes effectively work with data not well-localized in the heap by using bloom filters.
The FREEZE option in the COPY command has been enhanced by updating page visibility bits on data loading performance. Binary-mode COPY is also now faster.
PostgreSQL 14 has refined query parallelism by adding query parallelism to REFRESH MATERIALIZED VIEW and RETURN QUERY. These improvements have been rolled out to nested loop joins and parallel sequential scans.
PostgreSQL 14 has increased the number of places extended statistics can be used for expressions and OR clause estimation. Unlike previously, where statistics were only on columns, it is now allowed on a group of expressions. System view pg_stats_ext_exprs reports such statistics.
Enhanced Logical Replication
Logical replication has been improved to allow streaming of long in-progress transactions via the API or to subscribers. In previous versions, transactions larger than logical_decoding_work_mem were written to disk until the transaction was completed before replicating the transaction to the subscriber. With the new enhancements in place, logical replication handles large volumes of DDL statements more efficiently.
After each command, logical replication writes cache invalidation messages to the WAL for processing by streaming in-progress transactions. This process creates significant performance benefits.
One of the biggest buzzwords of 2022 is monitoring, as administrators and developers want more insight into performance over time. PostgreSQL 14 has introduced new features to help with monitoring. The most significant change of them all is the moving of the query hash system from pg_stat_statement to the core database. Query monitoring has been allowed across several PostgreSQL systems and logging functions by using a single ID.
This version has also introduced the following new monitoring parameters:
REINDEX on child indexes of a partitioned table
Upgrading to PostgreSQL 14
If you want to upgrade your current PostgreSQL version to version 14, you have three main options which will perform this task:
Pg_dumpall: It is a logical backup tool that allows you to dump your data and restore it in the new PostgreSQL version. Here you will have a downtime period that will vary according to your data size. You need to stop the system or avoid new data in the primary node, run the pg_dumpall, move the generated dump to the new database node, and restore it. During this time, you can’t write into your primary PostgreSQL database to avoid data inconsistency.
Pg_upgrade: It is a PostgreSQL tool to upgrade your PostgreSQL version. It could be dangerous in a production environment, and we don’t recommend this method in that case. Using this method, you will have downtime, but it will probably be considerably less than using the previous pg_dump method.
Logical Replication: Since PostgreSQL 10, you can use this replication method which allows you to perform major version upgrades with zero (or almost zero) downtime. In this way, you can add a standby node in the last PostgreSQL version, and when the replication is up-to-date, you can perform a failover process to promote the new PostgreSQL node.
For more detailed information about the new PostgreSQL 14 features, you can refer to the Official Documentation.
PostgreSQL 14 has really upped the game for users of this popular open-source database. Now developers and administrators have improved capabilities for boosting high performance and scalability, enhancements for distributed databases, improved observability, and more.
ClusterControl makes upgrading to the latest technology versions simple and stress-free. If you’re not already using ClusterControl, you can evaluate it free for 30 days. For more details on the latest release of ClusterControl 1.9.2, check out the full release notes.