Using PostgreSQL Replication Slots

Paul Namuag

What are Replication Slots?

Back in the days when "Replication Slots" were not yet introduced, managing the WAL segments were a challenge. In standard streaming replication, the master has no knowledge of the slave status.  Take the example of a master that executes a large transaction, while a standby node is in maintenance mode for a couple of hours (such as upgrading the system packages, adjusting network security, hardware upgrade, etc.). At some point, the master removes its transaction log (WAL segments) as checkpoint passes. Once the slave is off maintenance, it possibly has a huge slave lag and has to catch up with the master. Eventually, the slave will get a fatal issue like below:

LOG:  started streaming WAL from primary at 0/73000000 on timeline 1

FATAL:  could not receive data from WAL stream: ERROR:  requested WAL segment 000000010000000000000073 has already been removed

The typical approach is to specify in your postgresql.conf a WAL archival script that will copy WAL files to one or more long-term archive locations. If you don’t have any standbys or other streaming replication clients, then basically the server can discard the WAL file once the archive script is done or responds OK. But you’ll still need some recent WAL files for crash recovery (data from recent WAL files get replayed during crash recovery. In our example of a standby node which is placed for a long maintenance period, problems arise when it comes back online and asks the primary for a WAL file that the primary no longer has, then the replication fails.

This problem was addressed in PostgreSQL 9.4 via "Replication Slots".

If not using replication slots, a common way to reduce the risk of failing replication is to set the wal_keep_segments high enough so that WAL files that might be needed won't be rotated or recycled. The disadvantage of this approach is that it's hard to determine what value is best for your setup. You won't need maintenance on a daily basis or you won't need to retain a large pile of WAL files that eats your disk storage. While this works, it's not an ideal solution as risking disk space on the master can cause incoming transactions to fail.

Alternative approaches of not using replication slots is to configure PostgreSQL with continuous archiving and provide a restore_command to give the replica access to the archive. To avoid WAL build-up on the primary, you may use a separate volume or storage device for the WAL files, e.g., SAN or NFS. Another thing is with synchronous replication since it requires that primary has to wait for standby nodes to commit transaction. This means, it assures that WAL files have been applied to the standby nodes. But still, it's best that you provide archiving commands from the primary so that once WAL's are recycled in the primary, rest assured that you have WAL backups in case for recovery. Although in some situations, synchronous replication is not an ideal solution as it comes with some performance overhead as compared with asynchronous replication.

Types of Replication Slots

There are two types of replication slots. These are:

Physical Replication Slots 

Can be used for standard streaming replication. They will make sure that data is not recycled too early. 

Logical Replication Slots

Logical replication does the same thing as physical replication slots and are used for logical replication. However, they are used for logical decoding. The idea behind logical decoding is to give users a chance to attach to the transaction log and decode it with a plugin. It allows to extract changes made to the database and therefore to the transaction log in any format and for any purpose.

In this blog, we'll be using physical replication slots and how to achieve this using ClusterControl.

Advantages and Disadvantages of Using Replication Slots

Replications slots are definitely beneficial once enabled. By default, "Replication Slots" are not enabled and have to be set  manually. Among the advantages of using Replication Slots are

  • Ensures master retains enough WAL segments for all replicas to receive them
  • Prevents the master from removing rows that could cause recovery conflict on the replicas
  • A master can only recycle the transaction log once it has been consumed by all replicas. The advantage here is that a slave can never fall behind so much that a re-sync is needed.

Replication slots also come with some caveats.

  • An orphan replication slot can cause unbounded disk growth due to piled up WAL files from the master
  • Slave nodes placed under long maintenance (such as days or weeks) and that are tied to a replication slot will have unbounded disk growth due to piled up WAL files from the master

You can monitor this by querying pg_replication_slots to determine the slots that are not used. We'll check back on this a bit later.

Using Replication Slots 

As stated earlier, there are two types of replication slots. For this blog, we'll use physical replication slots for streaming replication.

Creating A Replication Slot

Creating a replication is simple. You need to invoke the existing function pg_create_physical_replication_slot to do this and has to be run and created in the master node. The function is simple,

maximus_db=# \df pg_create_physical_replication_slot

Schema              | pg_catalog

Name                | pg_create_physical_replication_slot

Result data type    | record

Argument data types | slot_name name, immediately_reserve boolean DEFAULT false, OUT slot_name name, OUT xlog_position pg_lsn

Type                | normal

e.g. Creating a replication slot named slot1,

postgres=# SELECT pg_create_physical_replication_slot('slot1');

-[ RECORD 1 ]-----------------------+---------

pg_create_physical_replication_slot | (slot1,)

The replication slot names and its underlying configuration is only system-wide and not cluster-wide. For example, if you have nodeA (current master), and standby nodes nodeB and nodeC, creating the slot on a master nodeA namely "slot1", then data will not be available to nodeB and nodeC. Therefore, when failover/switchover is about to happen, you need to re-create the slots you have created.

Dropping A Replication Slot

Unused replication slots have to be dropped or deleted. As stated earlier, when there are orphaned replication slots or slots that have not been assigned to any client or standby nodes, it can lead to boundless disk space issues if left undropped. So it is very important that these have to be dropped when it's no longer use. To drop it, simply invoke pg_drop_replication_slot. This function has the following definition:

maximus_db=# \df pg_drop_replication_slot

Schema              | pg_catalog

Name                | pg_drop_replication_slot

Result data type    | void

Argument data types | name

Type                | normal

Dropping it is simple:

maximus_db=# select pg_drop_replication_slot('slot2');

-[ RECORD 1 ]------------+-

pg_drop_replication_slot |

Monitoring Your PostgreSQL Replication Slots

Monitoring your replication slots is something that you don't want to miss. Just collect the information from view pg_replication_slots in the primary/master node just like below:

postgres=# select * from pg_replication_slots;

-[ RECORD 1 ]-------+-----------

slot_name           | main_slot

plugin              |

slot_type           | physical

datoid              |

database            |

active              | t

active_pid          | 16297

xmin                |

catalog_xmin        |

restart_lsn         | 2/F4000108

confirmed_flush_lsn |

-[ RECORD 2 ]-------+-----------

slot_name           | main_slot2

plugin              |

slot_type           | physical

datoid              |

database            |

active              | f

active_pid          |

xmin                |

catalog_xmin        |

restart_lsn         |

confirmed_flush_lsn |

The above result shows that the main_slot has been taken, but not main_slot2.

Another thing you can do is to monitor how much lag behind the slots you have. To achieve this, you can simply use the query based on the sample result below:

postgres=# SELECT redo_lsn, slot_name,restart_lsn, 

round((redo_lsn-restart_lsn) / 1024 / 1024 / 1024, 2) AS GB_behind 

FROM pg_control_checkpoint(), pg_replication_slots;

redo_lsn    | slot_name | restart_lsn | gb_behind 

------------+-----------+-------------+-----------

 1/8D400238 |     slot1 | 0/9A000000 | 3.80

But redo_lsn is not present in 9.6, shall use redo_location, so in 9.6,

imbd=# SELECT redo_location, slot_name,restart_lsn, 

round((redo_location-restart_lsn) / 1024 / 1024 / 1024, 2) AS GB_behind 

FROM pg_control_checkpoint(), pg_replication_slots;

-[ RECORD 1 ]-+-----------

redo_location | 2/F6008BE0

slot_name     | main_slot

restart_lsn   | 2/F6008CC0

gb_behind     | 0.00

-[ RECORD 2 ]-+-----------

redo_location | 2/F6008BE0

slot_name     | main_slot2

restart_lsn   | 2/F6008CC0

gb_behind     | 0.00

System Variable Requirements

Implementing replication slots requires manual setting. There are variables that you have to keep in mind that require changes and be specified in your postgresql.conf. See below:

  • max_replication_slots – If set to 0, this means that replication slots are totally disabled. If you're using PostgreSQL < 10 versions, this slot has to be specified other than 0 (default). Since PostgreSQL 10, the default is 10. This variable specifies the maximum number of replication slots. Setting it to a lower value than the number of currently existing replication slots will prevent the server from starting.
  • wal_level – must at least be replica or higher (replica is default). Setting hot_standby or archive will map to replica. For a physical replication slot, replica is enough. For logical replication slots, logical is preferred.
  • max_wal_senders – set to 10 by default, 0 in 9.6 version which means replication is disabled. We suggest you set this at least to 16 especially when running with ClusterControl.
  • hot_standby – in versions < 10, you need to set this to on which is off by default. This is important for standby nodes which means when on, you can connect and run queries during recovery or in standby mode.
  • primary_slot_name –  this variable is set via recovery.conf on the standby node. This is the slot to be used by the receiver or standby node when connecting with the sender (or primary/master).

You have to take note that these variables mostly require a database service restart in order to reload new values.

Using Replication Slots in a ClusterControl PostgreSQL Environment

Now, let’s see how we can use physical replication slots and implement them within a Postgres setup managed by ClusterControl.

Deploying of PostgreSQL Database Nodes

Let's start deploying a 3-node PostgreSQL Cluster using ClusterControl using PostgreSQL 9.6 version this time.

ClusterControl will deploy nodes with the following system variables defined accordingly based on their defaults or tuned up values. In:

postgres=# select name, setting from pg_settings where name in ('max_replication_slots', 'wal_level', 'max_wal_senders', 'hot_standby');

         name          | setting 

-----------------------+---------

 hot_standby           | on

 max_replication_slots | 0

 max_wal_senders       | 16

 wal_level             | replica

(4 rows)

In versions PostgreSQL > 9.6, max_replication_slots default value is 10 which is enabled by default but not in 9.6 or lower versions which is disabled by default. You need to assign max_replication_slots higher than 0. In this example, I set max_replication_slots to 5.

[email protected]:~# grep 'max_replication_slots' /etc/postgresql/9.6/main/postgresql.conf 

# max_replication_slots = 0                     # max number of replication slots

max_replication_slots = 5

and restarted the service,

[email protected]:~# pg_lsclusters 

Ver Cluster Port Status Owner    Data directory Log file

9.6 main    5432 online postgres /var/lib/postgresql/9.6/main pg_log/postgresql-%Y-%m-%d_%H%M%S.log



[email protected]:~# pg_ctlcluster 9.6 main restart

Setting The Replication Slots For Primary and Standby Nodes

There's no option in ClusterControl to do this, so you have to create your slots manually. In this example, I created the slots in the primary in host 192.168.30.100:

192.168.10.100:5432 [email protected]_db=# SELECT pg_create_physical_replication_slot('slot1'), pg_create_physical_replication_slot('slot2');

 pg_create_physical_replication_slot | pg_create_physical_replication_slot 

-------------------------------------+-------------------------------------

 (slot1,)                            | (slot2,)

(1 row)

Checking what we have just created shows,

192.168.10.100:5432 [email protected]_db=# select * from pg_replication_slots;

 slot_name | plugin | slot_type | datoid | database | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn 

-----------+--------+-----------+--------+----------+--------+------------+------+--------------+-------------+---------------------

 slot1     | | physical  | | | f      | | |       | | 

 slot2     | | physical  | | | f      | | |       | | 

(2 rows)

Now in the standby nodes, we need to update the recovery.conf and add the variable primary_slot_name and change the application_name so it's easier to identify the node. Here's how it looks like in host 192.168.30.110 recovery.conf: 

[email protected]:/var/lib/postgresql/9.6/main/pg_log# cat ../recovery.conf 

standby_mode = 'on'

primary_conninfo = 'application_name=node11 host=192.168.30.100 port=5432 user=cmon_replication password=m8rLmZxyn23Lc2Rk'

recovery_target_timeline = 'latest'

primary_slot_name = 'slot1'

trigger_file = '/tmp/failover_5432.trigger'

Doing the same thing as well in host 192.168.30.120 but changed the application_name and set the primary_slot_name = 'slot2'.

Checking the replication slot health:

192.168.10.100:5432 [email protected]_db=# select * from pg_replication_slots;

 slot_name | plugin | slot_type | datoid | database | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn 

-----------+--------+-----------+--------+----------+--------+------------+------+--------------+-------------+---------------------

 slot1     | | physical  | | | t      | 24252 | |       | 0/CF0A4218 | 

 slot2     | | physical  | | | t      | 11635 | |       | 0/CF0A4218 | 

(2 rows)

What Else Do You Need?

Since ClusterControl doesn't support Replication Slots as of this time, there are things that you need to take into account. What are these? Let's go into details.

Failover/Switchover Process

When an auto failover or switchover via ClusterControl has been attempted, slots will not be retained from the primary and on the standby nodes. You need to re-create this manually, check the variables if set correctly, and modify the recovery.conf accordingly.

Rebuilding a Slave from a Master

When rebuilding a slave, the recovery.conf will not be retained. This means that your recovery.conf settings having the primary_slot_name will be erased. You need to specify this manually again and check the pg_replication_slots view to determine if slots are properly used or left orphaned.

If you want to rebuild the slave/standby node from a master, you might have to consider specifying the PGAPPNAME env variable just like the command below:

$ export PGAPPNAME="app_repl_testnode15"; /usr/pgsql-9.6/bin/pg_basebackup -h 192.168.10.190 -U cmon_replication -D /var/lib/pgsql/9.6/data -p5434 -W -S main_slot -X s -R -P

Specifying the -R param is very important so it will re-create the recovery.conf, while -S shall specify what slot name to use when rebuilding the standby node.

Conclusion

Implementing the Replication Slots in PostgreSQL is straightforward yet there are certain caveats that you must remember. When deploying with ClusterControl, you’ll need to update some settings during failover or slave rebuilds.

ClusterControl
The only management system you’ll ever need to take control of your open source database infrastructure.