blog
How to Optimize PostgreSQL Logical Replication
Logical Replication or Pglogical is a table level, WAL based replication mechanism which replicates the data of specific Tables between two PostgreSQL instances. There seems to be a confusion between “pglogical” and “Logical Replication”. Both of them provide the same kind of replication mechanism with some differences in features and capabilities. Logical Replication is introduced in PostgreSQL-10 as an in-built feature unlike pglogical which is an extension. “Pglogical” with ongoing continuous developments, remains as the only option for implementing Logical Replication for those environments using PostgreSQL versions prior to 10. Eventually, all the features part of pglogical will be part of Logical Replication. In other words, pglogical (extension) became Logical Replication (in-built feature). The basic advantage of Logical Replication is that it does not need any extensions to be installed / created which is in turn beneficial to those environments where-in installing extensions is restricted.
This blog will focus on optimizing Logical Replication. That means, the optimization tips and techniques highlighted in this blog will apply for both pglogical and Logical Replication.
Logical Replication is a WAL based replication which is first of its kind. As a DBA, this would be much more reliable and performant replication mechanism when compared to other trigger based replication solutions. The changes made to the tables part of pglogical replication are replicated in real-time via WAL records which makes it highly efficient and non complex. All of the other replication mechanisms in the market are trigger based which can pose performance and maintenance challenges. With Logical Replication coming in, dependency on trigger based replication is almost gone.
There are other blogs which explain how to configure Logical Replication in quite a detail.
In this blog, the focus will be on how to optimize Logical Replication.
Optimizing Logical Replication
To begin with, the behaviour of “Logical Replication” is quite similar to “Streaming Replication”, the only difference is that streaming replication replicates the complete database whereas Logical Replication replicates only individual tables. When choosing specific individual tables to replicate, there are factors / challenges to be foreseen.
Let us take a look at factors influencing Logical replication.
Factors Influencing Logical Replication Performance
Optimizing Logical Replication is important to ensure data is replicated seamlessly without any interruptions. There are factors to foresee before setting up it. Let us take a look at them:
- The type of data stored in the Tables to be replicated
- How transactionally active are the tables (part of replication)
- Infrastructure capacity must be foreseen
- Parameter configuration must be optimally done
All of the above factors influence Logical Replication to a greater extent. Let us take a look at them in detail.
PostgreSQL Logical Replication Data Types
Understanding the type of data stored in the table is important. If the table part of replication stores Large text or Binary objects and encounters a high number of transactions, then, replication might slow down due to high usage of infrastructure resources. The capacity of the infrastructure must be adequate to handle such complex and big size data replication.
How Active Tables are Transactionally Part of Replication
When replicating highly transactionally active tables, Replication might lag behind in sync because of I/O performance issues, deadlocks, etc, which needs to be taken into consideration. This may not make production database environments look healthier. If the number of tables being replicated is high and the data is replicated to multiple sites, then, there might be high CPU usage and more number of CPUs (or CPU cores) are required.
Infrastructure Capacity
Before considering Logical Replication as a solution, it is important to ensure the Infrastructure capacity of the database servers is adequate enough. If there are a high number of tables being replicated, then, there must be enough CPUs available to do the replication job.
When replicating a high number of tables, consider splitting them into groups and replicate in parallel. Again, this will need multiple CPUs to be available for replication. If the data changes to the tables being replicated are frequent and high this might impact the replication performance as well.
Optimizing Parameters for Logical Replication
Parameters configured for Logical Replication functioning must be tuned optimally to ensure replication does not break.
Let us first take a look at parameters needed to configure it:
wal_level=’logical’
max_wal_senders=10 # greater than number of subscribers (or replicas)
max_replication_slots=10 # greater than number of subscribers (or replicas)
max_worker_processes=10 # greater than number of subscribers (or replicas)
max_logical_replication_workers # greater than number of subscribers (or replicas)
max_sync_workers_per_subscription # depends on number of tables being replicated
Tuning max_wal_senders
max_wal_senders must be always greater than the number of replicas. If the data is replicated to multiple sites, then multiple max_wal_senders come into play. So, it is important to ensure this parameter is set to an optimal number.
Tuning max_replication_slots
In general, all the data changes occurring on the tables are written to WAL files in pg_xlog / pg_wal which are termed as WAL records. Wal sender process would pick-up those WAL records (belonging to the tables being replicated) and sends across to the replicas and the wal_receiver process on the replica site would apply those changes at the subscriber node.
The WAL files are removed from the pg_xlog/pg_wal location whenever checkpoint occurs. If the WAL files are removed even before the changes are applied to the subscriber node, then, replication would break and lag behind. In-case subscriber node lags behind, a replication slot would ensure all the WAL files needed for the subscriber to get in sync with the provider are retained. It is recommended to configure one replication slot to each subscriber node.
Tuning max_worker_processes
It is important to have an optimal number of worker processors configured. This depends on how many max number of processes a server can have. This is possible only in multi-CPU environments. Max_worker_processes will ensure multiple processes are spawned to get the job done in a faster way by utilizing multiple CPU cores. When replicating data using Logical Replication, this parameter can help generate multiple worker processes to replicate the data faster. There is a specific parameter called max_logical_worker_processes which will ensure multiple processes are used to copy the data.
Tuning max_logical_worker_processes
This parameter specifies the maximum number of logical worker processes required to perform table data replication and synchronization. This value is taken from max_worker_processes which should be higher than this parameter value. This parameter is very beneficial when replicating data to multiple sites in multi-CPU environments. The default is 4. The max value depends on how many worker processes system supports.
Tuning max_sync_workers_per_subscription
This parameter specifies the maximum number of synchronization processes required per subscription. Synchronization process takes place during initial data sync and to ensure that happens faster this parameter can be used. Currently, only one synchronization process can be configured per table which means, multiple tables can be synced initially in parallel. The default value is 2. This value is picked from max_logical_worker_processes value.
Those are the parameters which must be tuned to ensure Logical Replication is efficient and faster. The other parameters which also effect Logical Replication are as follows.
wal_receiver_timeout, wal_receiver_status_interval and wal_retrieve_retry_interval.
These parameters do not have any effect on the provider node.
Conclusion
Replication specific tables is a common requirement which arises in large and complex database systems. This could be for business reporting or Data Warehousing purposes. As a DBA, I do believe Logical Replication greatly caters to such purposes due its easy implementation with less complexity. Configuring and tuning Logical Replication requires a good amount of planning, architecting and testing. The amount of data being replicated in real-time must be evaluated to ensure efficient and seem-less replication system is in place. To conclude, Databases running in PostgreSQL-10, Logical Replication is the way to go and for those databases running in PostgreSQL versions <10, pglogical is the option.