In the previous part I dared to play with a not implemented feature fantasising how would it work. Well HA in first place is a matter of design and only then implementation. It does not excuse bad implementation, neither it makes naive designing look smart. Yet after you cover all possible scenarios and found an adequate best rule for most cases, sometimes a very primitive small change can ruin the stronghold. Below I want to sandbox.
What Happens When pgpool Should Failover, But Can’t?
When health check fails for the master, the failover_command fired to degenerate all or promote next slave to primary. Sounds solid. What if it fails itself, eg ssh connection fails (e.g. because other – bad admin remove key from ~/.ssh/authorized_keys). What we have?
As soon as health_check_timeout (default 20) is out (also affected by retry delay, max retires and so on) the node turns dead, so:
t=# select nid,port,st from dblink('host=localhost port=5433','show pool_nodes') as t (nid int,hostname text,port int,st text,lb_weight float,role text,cnt int,cur_node text,del int); nid | port | st -----+------+------ 0 | 5400 | down 1 | 5401 | up 2 | 5402 | up (3 rows)
So no retries left and the failover failed. The first option obviously is doing failover manually. But if failover failed because of some stupid error, master is back on rails, and the only problem you have is pgpool thinking the master is offline – you would probably want to leave things as they used to be before the accident instead – right? Of course just moving master back online is not enough. Pgpool already “degenerated” the primary. Just adding it as a new node will not help either. The worst thing is that, after the event, pgpool will not try to check whether the old master is pg_is_in_recovery() or not, thus will never accept it as Primary. According to bug track you have to “Discard pgpool_status file and do not restore previous status” with pgpool -D command.
After discarding the status, we reconnect to avoid seeing server closed the connection unexpectedly and run:
t=# select nid,port,st,role from dblink('host=localhost port=5433','show pool_nodes') as t (nid int,hostname text,port int,st text,lb_weight float,role text,cnt int,cur_node text,del int); nid | port | st | role -----+------+----+--------- 0 | 5400 | up | primary 1 | 5401 | up | standby 2 | 5402 | up | standby (3 rows)
All nodes are back up and running, pgpool recognises the master.
Finally I want to cover some hints and observations on using pgpool:
Changing backend settings is a little tricky: hostname, port and directory require reload for adding new nodes, but require restart for editing existing. While weight and flag can be altered with just reload.
Don’t confuse load_balance_node column values with configuration. If you see just one node with true it’s not just OK – it’s meant so. It does not mean you have only one node in balancing pool – it just shows which node is chosen for this particular session. Below is query result with all three nodes participating in SELECT statements balancing, with node id 2 chosen:
t=# show pool_nodes; node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay ---------+-----------+------+--------+-----------+---------+------------+-------------------+------------------- 0 | localhost | 5400 | up | 0.125000 | primary | 61 | false | 0 1 | localhost | 5401 | up | 0.312500 | standby | 8 | false | 0 2 | localhost | 5402 | up | 0.562500 | standby | 11 | true | 0 (3 rows)
You can check which node was chosen for load balancing with show pool_nodes, but you care to know it for your query, not the “show” one, so such check is not always informative enough. Well you can monitor which node you use for the current query, with something like:
t=# select *,current_setting('port') from now(); now | current_setting -------------------------------+----------------- 2018-04-09 13:56:17.501779+01 | 5401 (1 row)
Important! But not:
t=# select now, setting from now() join pg_settings on name='port'; now | setting -----------------------------+--------- 2018-04-09 13:57:17.5229+01 | 5400 (1 row)
As it will ALWAYS return master’s port. Same applies for any pg_catalog SELECT.
As you noticed in previous parts, I use more complicated way, than just show pool_nodes to list nodes with state. I do it deliberately to demonstrate how you can make the result manageable. Using where makes the query longer, but the result clear, skipping all that distracts attention for our particular task. Compare:
t=# select nid,port,st,role from dblink('host=localhost port=5433','show pool_nodes') as t (nid int,hostname text,port int,st text,lb_weight float,role text,cnt int,cur_node text,del int); nid | port | st | role -----+------+----+--------- 0 | 5400 | up | primary 1 | 5401 | up | standby 2 | 5402 | up | standby
With the output of initial show pool_nodes…
You can’t compare pgbouncer and pgpool. But if you do, most important to know that parsing queries in pgpool depends on pg version. So when upgrading postgreSQL, you need to upgrade pgpool as well, while one pgbouncer instance can have config for 8,9,10 different clusters in the same ini file.
Why can’t I use just a failover script instead of pgpool? You can. But pgpool offers it ALONG with memcached and connection pooling and balancing and split brain control and is checked by decades of usage.
Bug Tracking system is in place – worth of visiting it if you work with pgpool: https://www.pgpool.net/mantisbt/my_view_page.php
Numerous typos in documentation, like bakance (backend + balance?..), statemnet, allowd or mismatch across version (pool_nodes used to be int and now are enum, but link to old values in pcp_node-info is still there) spoils the impression on this wonderful product. A form to send the report on found “bug” in documentation (just like “submit correction” on postgres docs) would greatly improve it though.
Important tip: before relying on any step – check it. E.g. after promoting node you can’t repromote it (here promoting is not postgres operation, but rather registration of the node as master for pgpool):
[email protected]:~# sudo -u postgres pcp_promote_node -w -h 127.0.0.1 -U vao -n 1 pcp_promote_node -- Command Successful [email protected]:~# sudo -u postgres pcp_promote_node -w -h 127.0.0.1 -U vao -n 1 FATAL: invalid pgpool mode for process recovery request DETAIL: specified node is already primary node, can't promote node id 1
Sounds logic and looks great. Yet, if you run this against wrong node (eg, node 0 is ! pg_is_in_recovery):
[email protected]:~# for i in $(seq 1 3); do pcp_promote_node -w -h 127.0.0.1 -U vao -n 0; echo $?; done pcp_promote_node -- Command Successful 0 pcp_promote_node -- Command Successful 0 pcp_promote_node -- Command Successful 0
Which is bad because you can’t repromote node and expect an error, but you get exit status 0…
Important tip: Don’t play too much. Never play on prod!
Playing with recovery_1st_stage_command using pg_rewind, I thought to try out of curiosity another monkey hack – querying pgpool_recovery() without arguments (as I ignore them in my set up anyway) and then just trying to attach the node to pgpool:
[email protected]:~# psql -p 5433 -h localhost template1 -c "SELECT pgpool_recovery('or_1st.sh', '', '', '')" pgpool_recovery ----------------- t (1 row) [email protected]:~# pcp_attach_node -h 127.0.0.1 -U vao -w -n 1 pcp_attach_node -- Command Successful
This stupid idea brought me to:
[email protected]:~# ps -aef | grep pgpool postgres 15227 1 0 11:22 ? 00:00:00 pgpool -D postgres 15240 15227 0 11:22 ? 00:00:00 pgpool: health check process(0) postgres 15241 15227 0 11:22 ? 00:00:00 pgpool: health check process(1) postgres 15242 15227 0 11:22 ? 00:00:00 pgpool: health check process(2) postgres 15648 15227 0 11:24 ? 00:00:00 [pgpool]
postgres 16264 15227 0 11:26 ? 00:00:00 pgpool: PCP: wait for connection request postgres 16266 15227 0 11:26 ? 00:00:00 [pgpool] postgres 16506 16264 0 11:26 ? 00:00:00 pgpool: PCP: processing recovery request postgres 16560 15227 0 11:26 ? 00:00:00 [pgpool] postgres 16835 15227 0 11:26 ? 00:00:00 [pgpool] postgres 16836 15227 0 11:26 ? 00:00:00 [pgpool]
No escape I have to:
[email protected]:~# kill -9 [email protected]:~# rm /var/run/pgpoolql/.s.PGSQL.5433 [email protected]:~# rm /var/run/pgpoolql/.s.PGSQL.9898
Above 5433 is pgpool port and 9898 is pcp port. Obviously after crash, files are not swept, so you have to do it manually.
- Do a careful reading and play a lot before taking pgpool to production. It’s much harder to find help with pgpool then postgres itself. Some questions are never answered. Especially when asked in wrong place (I answered it based on right place to get the answer)…
- Don’t forget the latest timeline for cascading replication (not really the pgpool hint, but often people don’t understand that in order to pickup a new master it is not enough to specify a right endpoint for receiver).
- Architecture with diagram could be found here.
In 10 years new promising features (watchdog and virtual ip) and important fixes (eg serialize_accept) appeared, but overall it leaves a undervalued impression. Docs have typos that have lived there for 10 years. I don’t believe no-one reads the docs. I don’t believe no one noticed. You just can’t report them in any easy way. There are plenty of guns loaded and prepared, lying on the documentation site for the novice user to take, point against the foot and pull the trigger. I have no reasonable idea how to improve it – I’m just warning the shooters. Misinterpreting one parameter can throw you in a desperate position of reverse engineering to find your mistake. All these years pgpool used to be and remains kind of a product for advanced users. Reading documentation I could not help myself not recalling the old Russian joke about Sherlock Holmes: Sherlock and Watson fly on the balloon. Suddenly the strong wind blows them thousands miles away. When they can land, they see the girl grazing sheep. Holmes asks the girl: “Darling where are we?” and the girl replies “You are on the balloon!”. Sherlock thanks and as they take off says “The wind took us very far – we are in Russia”. “But how do you know?” Watson asks. “It’s obvious – only in Russia coders graze sheep” Sherlock replies. “But how do you know the girl is coder?” – “It’s obvious – she gave us absolutely precise and totally useless answer”.