When reading PostgreSQL getting started, you see the line: “The PostgreSQL server can handle multiple concurrent connections from clients. To achieve this, it starts (“forks”) a new process for each connection. From that point on, the client and the new server process communicate without intervention by the original postgres process. Thus, the master server process is always running, waiting for client connections, whereas client and associated server processes come and go.”
Brilliant idea. And yet it means that every new connection spins a new process, reserving RAM and possibly getting too heavy with multiple sessions. To avoid problems, postgres has max_connections setting with default 100 connections. Of course you can increase it, but such action would require restart (pg_settings.context is ‘postmaster’):
t=# select name,setting,short_desc,context from pg_settings where name = 'max_connections'; -[ RECORD 1 ]-------------------------------------------------- name | max_connections setting | 100 short_desc | Sets the maximum number of concurrent connections. context | postmaster
And even after increasing – at some point you might need more connections (of course urgently as always on running prod). Why increasing it is so uncomfortable? Because if it was comfy, you would probably end up with uncontrolled spontaneous increasing of the number until the cluster starts lagging. Meaning old connections are slower – so they take more time, so you need even more and more new. To avoid such possible avalanche and add some flexibility, we have superuser_reserved_connections – to be able to connect and fix problems with SU when max_connections is exhausted. And we obviously see the need of some connection pooler. As we want new connection candidates to wait in a queue instead of failing with exception FATAL: sorry, too many clients already and not risking the postmaster.
Connection pooling is offered at some level by many popular “clients”. You could use it with jdbc for quite a while. Recently node-postgres offered it’s own node-pg-pool. More or less the implementation is simple (as the idea is): pooler starts the connections towards the database and keeps them. The client connecting to db only gets a “shared” existing connection and after closing it, the connection goes back to the pool. We also have much more sophisticated software, like pgPool. And yet pgbouncer is an extremely popular choice for the task. Why? Because it does only the pooling part, but does it right. It’s free. It’s fairly simple to set up. And you meet it at most biggest service providers as recommended or used, eg citusdata, aws, heroku and other highly respected resources.
So let us look closer at what it can and how you use it. In my setup I use default pool_mode = transaction ([pgbouncer] section) which is a very popular choice. This way we not just queue the connections exceeding max_connections, but rather reuse sessions without waiting for the previous connection to close:
[databases] mon = host=22.214.171.124 port=5432 dbname=mon mons = host=126.96.36.199 port=5432 dbname=mon pool_mode = session pool_size=2 max_db_connections=2 monst = host=188.8.131.52 port=5432 dbname=mon pool_mode = statement [pgbouncer] listen_addr = 184.108.40.206 listen_port = 6432 unix_socket_dir = /tmp auth_file = /pg/pgbouncer/bnc_users.txt auth_type = hba auth_hba_file = /pg/pgbouncer/bnc_hba.conf admin_users = root vao pool_mode = transaction server_reset_query = RESET ALL; --DEALLOCATE ALL; /* custom */ ignore_startup_parameters = extra_float_digits application_name_add_host = 1 max_client_conn = 10000 autodb_idle_timeout = 3600 default_pool_size = 100 max_db_connections = 100 max_user_connections = 100 #server_reset_query_always = 1 #uncomment if you want older global behaviour
Short overview of the most popular settings and tips and tricks:
server_reset_query is very handy and important. In session pooling mode, it “wipes” previous session “artifacts”. Otherwise you would have problems with same names for prepared statements, session settings affecting next sessions and so on. The default is DISCARD ALL, that “resets” all session states. Yet you can choose more sophisticated values, e.g., RESET ALL; DEALLOCATE ALL; to forget only SET SESSION and prepared statements, keeping TEMP tables and plans “shared”. Or the opposite – you might want to make prepared statements “global” from any session. Such configuration is doable, though risky. You have to make pgbouncer reuse the session for all (thus making either very small pool size or avalanching the sessions), which is not completely reliable. Anyway – it is a useful ability. Especially in setups where you want client sessions to eventually (not immediately) change to configured pooled session settings. Very important point here is session pool mode. Before 1.6 this setting affected other pool modes as well, so if you relied on it, you need to use the new setting server_reset_query_always = 1. Probably at some point people will want server_reset_query to be even more flexible and configurable per db/user pair (and client_reset_query instead). But as of current writing, March 2018, it’s not an option. The idea behind making this setting valid by default for session mode only was – if you share connection on transaction or statement level – you cannot rely on the session setting at all.
Auth_type = hba. Before 1.7, the big problem with pgbouncer was the absence of host based authentication – “postgres firewall”. Of course you still had it for postgres cluster connection, but pgbouncer was “open” for any source. Now we can use the same hba.conf to limit connections for host/db/user based on connection network.
connect_query is not performed on every client “connection” to pgbouncer, but rather when pgbouncer connects to a Postgres instance. Thus you can’t use it for setting or overriding “default” settings. In session mode, other sessions do not affect each other and on disconnect, reset query discards all – so you don’t need to mess with it. In transaction pooling mode, you would hope to use it for settings overriding erroneously set by other sessions, but it won’t work, alas. Eg. you want to share prepared statement between “sessions” in transaction mode, so you set something like
trns = dbname=mon pool_mode = transaction connect_query = 'do $$ begin raise warning $w$%$w$, $b$new connection$b$; end; $$; prepare s(int) as select $1;'
and indeed – every new client sees the prepared statements (unless you left server_reset_query_always to on, so pgbouncer discards it on commit). But if some client runs DISCARD s; in its session, it affects all clients on this connection and new clients connecting to it won’t see prepared statements anymore. But if you want to have some initial setting for postgres connections coming from pgbouncer, then this is the place.
application_name_add_host was added in 1.6, it has similar limitation. It “puts” the client IP to application_name, so you can easily get your bad query source, but is easily overridden by simple set application_name TO ‘wasn’’t me’; Still you can “heal” this using views – follow this post to get the idea or even use these short instructions. Basically idea is that show clients; will show the clients IP, so you can query it directly from pgbouncer database on each select from pg_stat_activity to check if it’s reset. But of course using a simple setting is much simpler and cosier. Though it does not guarantee the result…
pool_mode can be specified both as default, per database and per user – making it very flexible. Mixing modes makes pgbouncer extremely effective for pooling. This is a powerful feature, but one has to be careful when using it. Often users use it without understanding the results to absolutely atomic mixes of per transaction/per session/per user/per database/global settings working differently for the same user or database, due to the different pooling modes with pgbouncer. This is the box of matches you don’t give to children without supervision. Also many other options are configurable for default and per db and per user.
Please don’t take it literally, but you can “compare” different sections of ini with SET and ALTER: SET LOCAL affects transactions and is good to use when poll_mode=transaction , SET SESSION affects sessions and is safe for use when poll_mode=session , ALTER USER SET affects roles and will interfere with pgbouncer.ini part of section [users], ALTER DATABASE SET affects databases and will interfere with pgbouncer.ini part of section [databases], ALTER SYSTEM SET or editing postgres.conf globally affects defaults and is comparable by effect to the default section of pgbouncer.ini.
Once again – use pool mode responsibly. Prepared statements or session wide settings will be a mess in transaction pooling mode. Same as SQL transaction makes no sense in statement pooling mode. Choose a suitable pooling mode for suitable connections. A good practice is creating roles with the idea that:
- some will run only fast selects, thus can share one session without transactions for a hundred of concurrent tiny not important selects.
- Some role members are safe for session level concurrency, and ALWAYS use transactions. Thus they can safely share several sessions for hundreds of concurrent transactions.
- Some roles are just too messy of complicated to share their session with others. So you use session pooling mode for them to avoid errors on connection when all “slots” are already taken.
Don’t use it instead of HAProxy or some other load balancer. Despite the fact that pgbouncer has several configurable features addressing what a load balancer addresses, like dns_max_ttl and you can set up a DNS configuration for it, most prod environments use HAProxy or some other load balancer for HA. This is because HAProxy is really good at load balancing across live servers in round robin fashion, better than pgbouncer. Although pgbouncer is better for postgres connection pooling, it might be better to use one small daemon that perfectly performs one task, instead of a bigger one that does two tasks, but worse.
Configuration changes can be tricky. Some changes to pgbouncer.ini require restart (listen_port and such), while others such as admin_users require reload or SIGHUP. Changes inside auth_hba_file require reload, while changes to auth_file do not.
The extremely short overview of settings above is limited by the format. I invite you to take a look at the complete list. Pgbouncer is the kind of software with very small amount of “boring settings” – they all have huge potential and are of amazing interest.
And lastly, moving from a short enthusiastic review to something where you might be less happy – the installation. The process is clearly described in this section of documentation. The only option described is building from git sources. But everybody knows there are packages! Trying both most popular:
sudo yum install pgbouncer sudo apt-get install pgbouncer
can work. But sometimes you have to do an extra step. E.g., when no pgbouncer package is available, try this.
sudo yum install pgbouncer Loaded plugins: priorities, update-motd, upgrade-helper amzn-main | 2.1 kB 00:00:00 amzn-updates | 2.5 kB 00:00:00 docker-ce-edge | 2.9 kB 00:00:00 docker-ce-stable | 2.9 kB 00:00:00 docker-ce-test | 2.9 kB 00:00:00 pgdg10 | 4.1 kB 00:00:00 pgdg95 | 4.1 kB 00:00:00 pgdg96 | 4.1 kB 00:00:00 pglogical | 3.0 kB 00:00:00 sensu | 2.5 kB 00:00:00 (1/3): pgdg96/x86_64/primary_db | 183 kB 00:00:00 (2/3): pgdg10/primary_db | 151 kB 00:00:00 (3/3): pgdg95/x86_64/primary_db | 204 kB 00:00:00 50 packages excluded due to repository priority protections Resolving Dependencies --> Running transaction check ---> Package pgbouncer.x86_64 0:1.8.1-1.rhel6 will be installed --> Processing Dependency: libevent2 >= 2.0 for package: pgbouncer-1.8.1-1.rhel6.x86_64 --> Processing Dependency: c-ares for package: pgbouncer-1.8.1-1.rhel6.x86_64 --> Processing Dependency: libcares.so.2()(64bit) for package: pgbouncer-1.8.1-1.rhel6.x86_64 --> Running transaction check ---> Package c-ares.x86_64 0:1.13.0-1.5.amzn1 will be installed ---> Package pgbouncer.x86_64 0:1.8.1-1.rhel6 will be installed --> Processing Dependency: libevent2 >= 2.0 for package: pgbouncer-1.8.1-1.rhel6.x86_64 --> Finished Dependency Resolution Error: Package: pgbouncer-1.8.1-1.rhel6.x86_64 (pgdg10) Requires: libevent2 >= 2.0 You could try using --skip-broken to work around the problem You could try running: rpm -Va --nofiles --nodigest
Of course adding pgdg to /etc/yum.repos.d/ won’t help anymore. Neither the –skip-broken or rpm -Va –nofiles –nodigest. A simple
sudo yum install libevent2 Loaded plugins: priorities, update-motd, upgrade-helper 50 packages excluded due to repository priority protections No package libevent2 available. Error: Nothing to do
would be too easy. So you have to build libevent2 yourself, bringing you back to the position when you have to compile things yourself. Either it is pgbouncer or one of its dependencies.
Again – digging too deep with the particularities of installation is out of scope. You should know you have a big chance to install it as package.
Lastly – questions like “why postgres does not offer a native session pooler” comes over and over. There are even very fresh suggestions and thoughts on it. But so far the most popular approach here is using pgbouncer.