blog

A Guide to Pgpool for PostgreSQL: Part One

Vladimir Svedov

Published: April 17, 2018
Last Updated: May 4, 2022

Pgpool is less actual today, than it used to be 10 years ago, when it was the default part of a production PostgreSQL set up. Often when somebody was talking about PostgreSQL cluster, they were referring to postgreSQL behind pgpool and not to the PostgreSQL instance itself (which is the right term). Pgpool is recognised between most influential Postgres players: postgresql community, commandprompt, 2ndquadrant, EDB, citusdata, postgrespro (ordered by age, not influence). I realize the level of recognition in my links is very different – I just want to emphasize the overall impact of pgpool in the postgres world. Some of the most known current postgres “vendors” were found after the pgpool was already famous. So what makes it so famous?

Just the list of most in-demand offered features makes it look great:

native replication
connection pooling
load balancing for read scalability
high availability (watchdog with virtual IP, online recovery & failover)

Well, let’s make a sandbox and play. My sample setup is master slave mode. I would assume it is the most popular today, because you typically use streaming replication together with load balancing. Replication mode is barely used these days. Most DBAs skip it in favour to streaming replication and pglogical, and previously to slony.

The replication mode has many interesting settings and surely interesting functionality. But most DBAs have master/multi slave setup by the time they get to pgpool. So they are looking for automatic failover and load balancer, and pgpool offers it out of the box for existing master/multi slave environments. Not to mention that as from Postgres 9.4, streaming replication works with no major bugs and from 10 hash indexes replication is supported, so there are barely anything to stop you from using it. Also streaming replication is asynchronous by default (configurable to synchronous and even not “linear” synchronization complicated setups, while native pgpool replication is synchronous (which means slower data changes) with no choice option. Also additional limitations apply. Pgpool manual itself suggests to prefer when possible streaming replication over pgpool native one). And so this is my choice here.

Ah, but first we need to install it – right?

Installation (of higher version on ubuntu).

First checking the ubuntu version with lsb_release -a. For me repo is:

root@u:~# sudo add-apt-repository 'deb http://apt.postgresql.org/pub/repos/apt/ xenial-pgdg main'
wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | 
>   sudo apt-key add -
OK
root@u:~# sudo apt-get update

Lastly installation itself:

sudo apt-get install pgpool2=3.7.2-1.pgdg16.04+1

Config:

I user default config from recommended mode:

zcat /usr/share/doc/pgpool2/examples/pgpool.conf.sample-stream.gz > /etc/pgpool2/pgpool.conf

Starting:

If you missed config, you see:

2018-03-22 13:52:53.284 GMT [13866] FATAL:  role "nobody" does not exist

Ah true – my bad, but easily fixable (doable blindly with one liner if you want the same user for all healthchecks and recovery):

root@u:~# sed -i s/'nobody'/'pgpool'/g /etc/pgpool2/pgpool.conf

And before we go any further, let’s create database pgpool and user pgpool in all clusters (In my sandbox they are master, failover and slave, so I need to run it on master only):

t=# create database pgpool;
CREATE DATABASE
t=# create user pgpool;
CREATE ROLE

At last – starting:

postgres@u:~$ /usr/sbin/service pgpool2 start
postgres@u:~$ /usr/sbin/service pgpool2 status
pgpool2.service - pgpool-II
   Loaded: loaded (/lib/systemd/system/pgpool2.service; enabled; vendor preset: enabled)
   Active: active (running) since Mon 2018-04-09 10:25:16 IST; 4h 14min ago
     Docs: man:pgpool(8)
  Process: 19231 ExecReload=/bin/kill -HUP $MAINPID (code=exited, status=0/SUCCESS)
 Main PID: 8770 (pgpool)
    Tasks: 10
   Memory: 5.5M
      CPU: 18.250s
   CGroup: /system.slice/pgpool2.service
           ├─ 7658 pgpool: wait for connection reques
           ├─ 7659 pgpool: wait for connection reques
           ├─ 7660 pgpool: wait for connection reques
           ├─ 8770 /usr/sbin/pgpool -n
           ├─ 8887 pgpool: PCP: wait for connection reques
           ├─ 8889 pgpool: health check process(0
           ├─ 8890 pgpool: health check process(1
           ├─ 8891 pgpool: health check process(2
           ├─19915 pgpool: postgres t ::1(58766) idl
           └─23730 pgpool: worker proces

Great – so we can proceed to the first feature – let’s check load balancing. It has some requirements to be used, supports hints (e.g. to balance in same session), has black-and-white-listed functions, has regular expressions based redirect preference list. It is sophisticated. Alas goingf thoroughly over all that functionality would be out of the scope of this blog, thus we will check the simplest demos:

First, something very simple will show which node is used for select (in my setup, master spins on 5400, slave on 5402 and failover on 5401, while pgpool itself is on 5433, as I have another cluster running and did not want to interfere with it):

vao@u:~$ psql -h localhost -p 5433 t -c "select current_setting('port') from ts limit 1"
 current_setting
-----------------
 5400
(1 row)

Then in loop:

vao@u:~$ (for i in $(seq 1 99); do psql -h localhost -p 5433 t -c "select current_setting('port') from ts limit 1" -XAt; done) | sort| uniq -c
      9 5400
     30 5401
     60 5402

Great. It definitely balances load between nodes, but seems to balance not equally – maybe it’s so smart it knows the weight of each statement? Let’s check the distribution with expected results:

t=# show pool_nodes;
 node_id | hostname  | port | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay
---------+-----------+------+--------+-----------+---------+------------+-------------------+-------------------
 0       | localhost | 5400 | up     | 0.125000  | primary | 122        | false             | 0
 1       | localhost | 5401 | up     | 0.312500  | standby | 169        | false             | 0
 2       | localhost | 5402 | up     | 0.562500  | standby | 299        | true              | 0
(3 rows)

No – pgpool does not analyze the weight of statements – it was a DBA with her settings again! The settings (see the lb_weight attribute) reconciles with actual query destination targets. You can easily change it (as we did here) by changing the corresponding setting, eg:

root@u:~$ grep weight /etc/pgpool2/pgpool.conf
backend_weight0 =0.2
backend_weight1 = 0.5
backend_weight2 = 0.9
root@u:~# sed -i s/'backend_weight2 = 0.9'/'backend_weight2 = 0.2'/ /etc/pgpool2/pgpool.conf
root@u:~# grep backend_weight2 /etc/pgpool2/pgpool.conf
backend_weight2 = 0.2
root@u:~# pgpool reload
root@u:~$ (for i in $(seq 1 9); do psql -h localhost -p 5433 t -c "select current_setting('port') from ts limit 1" -XAt; done) | sort| uniq -c
      6 5401
      3 5402

PostgreSQL Management & Automation with ClusterControl

Learn about what you need to know to deploy, monitor, manage and scale PostgreSQL

Download the Whitepaper

Great! The next great feature offered is connection pooling. With 3.5 the “thundering herd problem” is solved by serializing accept() calls, greatly speeding up “client connection” time. And yet this feature is pretty straightforward. It does not offer several levels of pooling or several pools configured for the same database (pgpool lets you to choose where to run selects with database_redirect_preference_list of load balancing though), or other flexible features offered by pgBouncer.

So short demo:

t=# select pid,usename,backend_type, state, left(query,33) from pg_stat_activity where usename='vao' and pid <> pg_backend_pid();
 pid  | usename |  backend_type  | state |     left
------+---------+----------------+-------+--------------
 8911 | vao     | client backend | idle  |  DISCARD ALL
 8901 | vao     | client backend | idle  |  DISCARD ALL
 7828 | vao     | client backend | idle  |  DISCARD ALL
 8966 | vao     | client backend | idle  |  DISCARD ALL
(4 rows)
Hm - did I set up this little number of children?
t=# pgpool show num_init_children;
 num_init_children
-------------------
 4
(1 row)

Ah, true, I changed them lower than default 32, so the output would not take several pages. Well then, let’s try exceeding the number of sessions (below I open postgres sessions async in loop, so the 6 sessions would be requested at more or less the same time):

vao@u:~$ for i in $(seq 1 6); do (psql -h localhost -p 5433 t -U vao -c "select pg_backend_pid(), pg_sleep(1), current_setting('port'), clock_timestamp()" &);  done
vao@u:~$  pg_backend_pid | pg_sleep | current_setting |        clock_timestamp
----------------+----------+-----------------+-------------------------------
           8904 |          | 5402            | 2018-04-10 12:46:55.626206+01
(1 row)

 pg_backend_pid | pg_sleep | current_setting |        clock_timestamp
----------------+----------+-----------------+-------------------------------
           9391 |          | 5401            | 2018-04-10 12:46:55.630175+01
(1 row)

 pg_backend_pid | pg_sleep | current_setting |       clock_timestamp
----------------+----------+-----------------+------------------------------
           8911 |          | 5400            | 2018-04-10 12:46:55.64933+01
(1 row)

 pg_backend_pid | pg_sleep | current_setting |        clock_timestamp
----------------+----------+-----------------+-------------------------------
           8904 |          | 5402            | 2018-04-10 12:46:56.629555+01
(1 row)

 pg_backend_pid | pg_sleep | current_setting |        clock_timestamp
----------------+----------+-----------------+-------------------------------
           9392 |          | 5402            | 2018-04-10 12:46:56.633092+01
(1 row)

 pg_backend_pid | pg_sleep | current_setting |       clock_timestamp
----------------+----------+-----------------+------------------------------
           8910 |          | 5402            | 2018-04-10 12:46:56.65543+01
(1 row)

It lets sessions to come by three – expected, as one is taken by the above session (selecting from pg_stat_activity) so 4-1=3. As soon as pg_sleep finishes its one second nap and session is closed by postgres, the next one is let in. So after the first three ends, the next three step in. What happens to the rest? They are queued until the next connection slot frees up. Then the process described next to serialize_accept happens and client gets connected.

Huh? Just session pooling in session mode? Is it all?.. No, here the caching steps in! Look.:

postgres=# /*NO LOAD BALANCE*/ select 1;
 ?column?
----------
        1
(1 row)

Checking the pg_stat_activity:

postgres=# select pid, datname, state, left(query,33),state_change::time(0), now()::time(0) from pg_stat_activity where usename='vao' and query not like '%DISCARD%';
  pid  | datname  | state |               left                | state_change |   now
-------+----------+-------+-----------------------------------+--------------+----------
 15506 | postgres | idle  | /*NO LOAD BALANCE*/ select 1, now | 13:35:44     | 13:37:19
(1 row)

Then run the first statement again and observe state_change not changing, which means you don’t even get to the database to get a known result! Of course if you put some mutable function, results won’t be cached. Experiment with:

postgres=# /*NO LOAD BALANCE*/ select 1, now();
 ?column? |             now
----------+------------------------------
        1 | 2018-04-10 13:35:44.41823+01
(1 row)

You will find that state_change changes as does the result.

Last point here – why /*NO LOAD BALANCE*/ ?.. to be sure we check pg_stat_activity on master and run query on master as well. Same you can use /*NO QUERY CACHE*/ hint to avoid getting a cached result.

Already much for a short review? But we did not even touch the HA part! And many users look towards pgpool specifically for this feature. Well, this is not the end of the story, this is the end of part one. Part two is coming, where we will briefly cover HA and some other tips on using pgpool…

ClickHouse storage architecture and optimization

Enhancing database operations with ClusterControl and Model Context Protocol (MCP)

Why Cloud Repatriation Matters Now More Than Ever

Automating Day 2 operations: Scaling, upgrades and maintenance

Recommended

ClickHouse storage architecture and optimization

Enhancing database operations with ClusterControl and Model Context Protocol (MCP)

Why Cloud Repatriation Matters Now More Than Ever

Automating Day 2 operations: Scaling, upgrades and maintenance

Subscribe below to be notified of fresh posts