Below is an excerpt from our whitepaper “PostgreSQL Management and Automation with ClusterControl” which can be downloaded for free.
Revision Note: Keep in mind that the terms used in this blog Master-Slave are synonymous of Master-Standby terms used by PostgreSQL. We’re using Master-Slave to keep the parallelism with others technologies.
For HA configuration we can have several architectures, but the basic ones would be master-slave and master-master architectures.Database servers can work together to allow a second server to take over quickly if the primary server fails (high availability), or to allow several computers to serve the same data (load balancing).
PostgreSQL Master-Slave Architectures
These architectures enable us to maintain an master database with one or more standby servers ready to take over operations if the primary server fails. These standby databases will remain synchronized (or almost synchronized) with the master.
The replication between the master and the slaves can be made via SQL statements (logical standbys) or via the internal data structure modifications (physical standbys). PostgreSQL uses a stream of write-ahead log (WAL) records to keep the standby databases synchronized. If the main server fails, the standby contains almost all of the data of the main server, and can be quickly made the new master database server. This can be synchronous or asynchronous and can only be done for the entire database Server.
Setting up streaming replication is a task that requires some steps to be followed thoroughly. For those steps and some more background on this subject, please see: Become a PostgreSQL DBA – How to Setup Streaming Replication for High Availability.
From version 10, PostgreSQL includes the option to setup logical replication.
Logical replication allows a database server to send a stream of data modifications to another server. PostgreSQL logical replication constructs a stream of logical data modifications from the WAL. Logical replication allows the data changes from individual tables to be replicated. It doesn’t require a particular server to be designated as a master or a replica but allows data to flow in multiple directions.
You can find more information regarding logical replication: Blog: An Overview of Logical Replication in PostgreSQL.
To effectively ensure high availability, it is not enough to have a master-slave architecture. We also need to enable some automatic form of failover, so if something fails we can have the smallest possible delay in resuming normal functionality. PostgreSQL does not include an automatic failover mechanism to identify failures on the master database and notify the salve to take ownership, so that will require a little bit of work on the DBA’s side. You should work on a script that includes the pg_ctl promote command, that will promote the slave as a new master. There are also some third party tools for this automation. Many such tools exist and are well integrated with the operating system facilities required for successful failover, such as IP address Migration.
After a failover happens, you need to modify your application accordingly to work with the new master. You will also have only one server working, so re-creation of the master-slave architecture needs to be done, so we get back to the same normal situation that we had before the issue.
PostgreSQL Master-Master Architectures
This architecture provides a way of minimizing the impact of an error in one of the nodes, as the other node can take care of all the traffic, maybe slightly affecting the performance, but never losing functionality. It is also used to accomplish (and maybe this is even a more interesting point) horizontal scalability (scale-out), opposite to the concept of vertical scalability where we add more resources to a server (scale-up).
For implementing this architecture, you will need to use external tools, as this feature is not (yet) natively supported by PostgreSQL.
You must be very careful when choosing a solution for implementing master-master, as there are many different products. A lot of them are still “green” , with few serious users or success cases. Some other projects have, on the other hand, been abandoned, as there are no active maintainers.
For more information on the available tools please refer to: Blog: Top PG Clustering HA Solutions for PostgreSQL.
Load Balancing and Connection Pooling
There are several load balancer tools that can be used to manage the traffic from your application to get the most of your database architecture. In the same way, there are some others that can help you manage the way the application connects to the database, by pooling these connections and reusing them between different requests.
There are some products that are used for both purposes, like the well known pgpool, and some others that will focus in only one of these features, like pgbouncer (connection pooling) and HAProxy (used for load balancing).