blog

How to Achieve PostgreSQL High Availability with pgBouncer

Sebastian Insausti

Published: September 14, 2020
Last Updated: May 4, 2022

In the database world, there are many common concepts like High Availability, Failover, and Connection pooling. All of them are useful things to implement on any system, and even a must in some cases.

A connection pooling is a method of creating a pool of connections and reuse them avoiding opening new connections to the database all the time, which will increase the performance of your applications considerably. PgBouncer is a popular connection pooler designed for PostgreSQL, but it is not enough to achieve PostgreSQL High Availability by itself as it doesn’t have multi-host configuration, failover, or detection.

Using a Load Balancer is a way to have High Availability in your database topology. It could be useful for redirecting traffic to healthy database nodes, distribute the traffic across multiple servers to improve performance, or just to have a single endpoint configured in your application for an easier configuration and failover process. For this, HAProxy is a good option to complement your connection pooler, as it is an open-source proxy that can be used to implement high availability, load balancing, and proxying for TCP and HTTP based applications.

In this blog, we will use both concepts, Load Balancer and Connection pooling (HAProxy + PgBouncer), to deploy a High Availability environment for your PostgreSQL database.

How PgBouncer Works

PgBouncer acts as a PostgreSQL server, so you just need to access your database using the PgBouncer information (IP Address/Hostname and Port), and PgBouncer will create a connection to the PostgreSQL server, or it will reuse one if it exists.

When PgBouncer receives a connection, it performs the authentication, which depends on the method specified in the configuration file. PgBouncer supports all the authentication mechanisms that the PostgreSQL server supports. After this, PgBouncer checks for a cached connection, with the same username+database combination. If a cached connection is found, it returns the connection to the client, if not, it creates a new connection. Depending on the PgBouncer configuration and the number of active connections, it could be possible that the new connection is queued until it can be created, or even aborted.

The PgBouncer behavior depends on the pooling mode configured:

session pooling (default): When a client connects, a server connection will be assigned to it for the whole duration the client stays connected. When the client disconnects, the server connection will be put back into the pool.
transaction pooling: A server connection is assigned to a client only during a transaction. When PgBouncer notices that the transaction is over, the server connection will be put back into the pool.
statement pooling: The server connection will be put back into the pool immediately after a query completes. Multi-statement transactions are disallowed in this mode as they would break.

To balance queries between several servers, on the PgBouncer side, it may be a good idea to make server_lifetime smaller and also turn server_round_robin on. By default, idle connections are reused by the LIFO algorithm, which may work not so well when a load-balancer is used.

How to Install PgBouncer

We will assume you have your PostgreSQL cluster and HAProxy deployed, and it is up and running, otherwise, you can follow this blog post to easily deploy PostgreSQL for High Availability.

You can install PgBouncer on each database node or on an external machine, in any case, you will have something like this:

To get the PgBouncer software you can go to the PgBouncer download section, or use the RPM or DEB repositories. For this example, we will use CentOS 8 and will install it from the official PostgreSQL repository.

First, download and install the corresponding repository from the PostgreSQL site (if you don’t have it in place yet):

$ wget https://download.postgresql.org/pub/repos/yum/reporpms/EL-8-x86_64/pgdg-redhat-repo-latest.noarch.rpm

$ rpm -Uvh pgdg-redhat-repo-latest.noarch.rpm

Then, install the PgBouncer package:

$ yum install pgbouncer

Verify the installation:

$ pgbouncer --version

PgBouncer 1.14.0

libevent 2.1.8-stable

adns: c-ares 1.13.0

tls: OpenSSL 1.1.1c FIPS  28 May 2019

When it is completed, you will have a new configuration file located in /etc/pgbouncer/pgbouncer.ini:

[databases]

[users]

[pgbouncer]

logfile = /var/log/pgbouncer/pgbouncer.log

pidfile = /var/run/pgbouncer/pgbouncer.pid

listen_addr = 127.0.0.1

listen_port = 6432

auth_type = trust

auth_file = /etc/pgbouncer/userlist.txt

admin_users = postgres

stats_users = stats, postgres

Let’s see these parameters one by one:

Databases section [databases]: This contains key=value pairs where the key will be taken as a database name and the value as a libpq connection string style list of key=value pairs.
User section [users]: This contains key=value pairs where the key will be taken as a user name and the value as a libpq connection string style list of key=value pairs of configuration settings specific for this user.
logfile: Specifies the log file. The log file is kept open, so after rotation kill -HUP or on console RELOAD; should be done.
pidfile: Specifies the PID file. Without the pidfile set, the daemon is not allowed.
listen_addr: Specifies a list of addresses where to listen for TCP connections. You may also use * meaning “listen on all addresses”. When not set, only Unix socket connections are accepted.
listen_port: Which port to listen on. Applies to both TCP and Unix sockets. The default port is 6432.
auth_type: How to authenticate users.
auth_file: The name of the file to load usernames and passwords from.
admin_users: Comma-separated list of database users that are allowed to connect and run all commands on the console.
stats_users: Comma-separated list of database users that are allowed to connect and run read-only queries on the console.

This is just a sample of the default configuration file, as the original has 359 lines, but the rest of the lines are commented out by default. To get all the available parameters, you can check the official documentation.

How to Use PgBouncer

Now, let’s see a basic configuration to make it work.

The pgbouncer.ini configuration file:

$ cat /etc/pgbouncer/pgbouncer.ini

[databases]

world = host=127.0.0.1 port=5432 dbname=world

[pgbouncer]

logfile = /var/log/pgbouncer/pgbouncer.log

pidfile = /var/run/pgbouncer/pgbouncer.pid

listen_addr = *

listen_port = 6432

auth_type = md5

auth_file = /etc/pgbouncer/userlist.txt

admin_users = admindb

And the authentication file:

$ cat /etc/pgbouncer/userlist.txt

"admindb" "root123"

So, in this case, I have installed PgBouncer in the same database node, listening in all IP addresses, and it connects to a PostgreSQL database called “world”. I am also managing the allowed users in the userlist.txt file with a plain-text password that can be encrypted if needed.

To start the PgBouncer service, you just need to run the following command:

$ pgbouncer -d /etc/pgbouncer/pgbouncer.ini

Where -d means “daemon”, so it will run in the background.

$ netstat -pltn

Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name

tcp        0      0 0.0.0.0:6432            0.0.0.0:*               LISTEN      4274/pgbouncer

tcp6       0      0 :::6432                 :::*                    LISTEN      4274/pgbouncer

As you can see, PgBouncer is up and waiting for connections in the port 6432. To access the PostgreSQL database, run the following command using your local information (port, host, username, and database name):

$ psql -p 6432 -h 127.0.0.1 -U admindb world

Password for user admindb:

psql (12.4)

Type "help" for help.



world=#

Keep in mind that the database name (world) is the database configured in your PgBouncer configuration file:

[databases]

world = host=127.0.0.1 port=5432 dbname=world

Monitoring and Managing PgBouncer

Instead of accessing your PostgreSQL database, you can connect directly to PgBouncer to manage or monitor it. For this, use the same command that you used previously, but change the database to “pgbouncer”:

$ psql -p 6432 -h 127.0.0.1 -U admindb pgbouncer

Password for user admindb:

psql (12.4, server 1.14.0/bouncer)

Type "help" for help.



pgbouncer=# SHOW HELP;

NOTICE:  Console usage

DETAIL:

SHOW HELP|CONFIG|DATABASES|POOLS|CLIENTS|SERVERS|USERS|VERSION

SHOW FDS|SOCKETS|ACTIVE_SOCKETS|LISTS|MEM

SHOW DNS_HOSTS|DNS_ZONES

SHOW STATS|STATS_TOTALS|STATS_AVERAGES|TOTALS

SET key = arg

RELOAD

PAUSE []

RESUME []

DISABLE 

ENABLE 

RECONNECT []

KILL 

SUSPEND

SHUTDOWN



SHOW

Now, you can run different PgBouncer commands to monitor it:

SHOW STATS_TOTALS:

pgbouncer=# SHOW STATS_TOTALS;

 database  | xact_count | query_count | bytes_received | bytes_sent | xact_time | query_time | wait_time

-----------+------------+-------------+----------------+------------+-----------+------------+-----------

 pgbouncer |          1 |           1 |              0 |          0 |         0 |          0 |         0

 world     |          2 |           2 |             59 |     234205 |      8351 |       8351 |      4828

(2 rows)

SHOW SERVERS:

pgbouncer=# SHOW SERVERS;

 type |  user   | database | state  |   addr    | port | local_addr | local_port |      connect_time       |      request_time

| wait | wait_us | close_needed |      ptr       |      link      | remote_pid | tls

------+---------+----------+--------+-----------+------+------------+------------+-------------------------+-------------------------

+------+---------+--------------+----------------+----------------+------------+-----

 S    | admindb | world    | active | 127.0.0.1 | 5432 | 127.0.0.1  |      45052 | 2020-09-09 18:31:57 UTC | 2020-09-09 18:32:04 UTC

|    0 |       0 |            0 | 0x55b04a51b3d0 | 0x55b04a514810 |       5738 |

(1 row)

SHOW CLIENTS:

pgbouncer=# SHOW CLIENTS;

 type |  user   | database  | state  |   addr    | port  | local_addr | local_port |      connect_time       |      request_time

  | wait | wait_us | close_needed |      ptr       |      link      | remote_pid | tls

------+---------+-----------+--------+-----------+-------+------------+------------+-------------------------+-----------------------

--+------+---------+--------------+----------------+----------------+------------+-----

 C    | admindb | pgbouncer | active | 127.0.0.1 | 46950 | 127.0.0.1  |       6432 | 2020-09-09 18:29:46 UTC | 2020-09-09 18:55:11 UT

C | 1441 |  855140 |            0 | 0x55b04a5145e0 |                |          0 |

 C    | admindb | world     | active | 127.0.0.1 | 47710 | 127.0.0.1  |       6432 | 2020-09-09 18:31:41 UTC | 2020-09-09 18:32:04 UT

C |    0 |       0 |            0 | 0x55b04a514810 | 0x55b04a51b3d0 |          0 |

(2 rows)

SHOW POOLS:

pgbouncer=# SHOW POOLS;

 database  |   user    | cl_active | cl_waiting | sv_active | sv_idle | sv_used | sv_tested | sv_login | maxwait | maxwait_us | pool_

mode

-----------+-----------+-----------+------------+-----------+---------+---------+-----------+----------+---------+------------+------

-----

 pgbouncer | pgbouncer |         1 |          0 |         0 |       0 |       0 |         0 |        0 |       0 |          0 | state

ment

 world     | admindb   |         1 |          0 |         1 |       0 |       0 |         0 |        0 |       0 |          0 | sessi

on

(2 rows)

And to manage it…

RELOAD:

pgbouncer=# RELOAD;

RELOAD

PAUSE:

pgbouncer=# PAUSE world;

PAUSE

RESUME:

pgbouncer=# RESUME world;

RESUME

Those commands are just an example. For a complete list of commands, please refer to the official documentation.

Conclusion

Using a combination of PgBouncer + HAProxy + PostgreSQL is a good way to achieve High Availability for your PostgreSQL cluster improving your database performance at the same time.

As you can see, if you have your PostgreSQL environment in place, which you can deploy using ClusterControl in just a few clicks, you can easily add PgBouncer to take advantage of having a connection pooler for your systems.

Why Cloud Repatriation Matters Now More Than Ever

Automating Day 2 operations: Scaling, upgrades and maintenance

PostgreSQL Bi-Directional Logical Replication — A Deep Dive

Elasticsearch performance optimization