blog
Progress Reporting Enhancements in PostgreSQL 12
In PostgreSQL, many DDL commands can take a very long time to execute. PostgreSQL has the ability to report the progress of DDL commands during command execution. Since PostgreSQL 9.6, it has been possible to monitor the progress of running manual VACUUM and autovacuum using a dedicated system catalog (called pg_stat_progress_vacuum).
PostgreSQL 12 has added support for monitoring the progress of a few more commands like CLUSTER, VACUUM FULL,CREATE INDEX, and REINDEX.
Currently, the progress reporting facility is available only for command as below.
- VACUUM command
- CLUSTER command
- VACUUM FULL command
- CREATE INDEX command
- REINDEX command
Why is the Progress Reporting Feature in PostgreSQL Important?
This feature is very important for operators when doing some long-running operations, because it is possible to not blindly wait for an operation to finish.
This is a very useful feature to get some insight like:
- How much total work there is
- How much work already done
Progress reporting feature is also useful when doing performance workload analysis, this is also proving to be useful in evaluating VACUUM job processing for tuning system-level parameters or relation level once depending on load pattern.
Supported Commands and system catalog
DDL Command |
System Catalog |
Supported PostgreSQL Version |
VACUUM |
pg_stat_progress_vacuum |
9.6 |
VACUUM FULL |
pg_stat_progress_cluster |
12 |
CLUSTER |
pg_stat_progress_cluster |
12 |
CREATE INDEX |
pg_stat_progress_create_index |
12 |
REINDEX |
pg_stat_progress_create_index |
12 |
How to Monitor the Progress of the VACUUM Command
Whenever the VACUUM command is running, the pg_stat_progress_vacuum view will contain one row for each backend (including autovacuum worker processes) that is currently vacuuming. The view to check the progress of running VACUUM and VACCUM FULL commands is different because the operation phases of both commands are different.
Operation Phases of the VACUUM Command
- Initializing
- Scanning heap
- Vacuuming indexes
- Vacuuming heap
- Cleaning up indexes
- Truncating heap
- Performing final cleanup
This view is available in PostgreSQL 12 which gives the following information:
postgres=# d pg_stat_progress_vacuum ;
View "pg_catalog.pg_stat_progress_vacuum"
Column | Type | Collation | Nullable | Default
--------------------+---------+-----------+----------+---------
pid | integer | | |
datid | oid | | |
datname | name | | |
relid | oid | | |
phase | text | | |
heap_blks_total | bigint | | |
heap_blks_scanned | bigint | | |
heap_blks_vacuumed | bigint | | |
index_vacuum_count | bigint | | |
max_dead_tuples | bigint | | |
num_dead_tuples | bigint | | |
Example:
postgres=# create table test ( a int, b varchar(40), c timestamp );
CREATE TABLE
postgres=# insert into test ( a, b, c ) select aa, bb, cc from generate_series(1,10000000) aa, md5(aa::varchar) bb, now() cc;
INSERT 0 10000000
postgres=# DELETE FROM test WHERE mod(a,6) = 0;
DELETE 1666666
Session 1:
postgres=# vacuum verbose test;
[. . . waits for completion . . .]
Session 2:
postgres=# select * from pg_stat_progress_vacuum;
-[ RECORD 1 ]------+--------------
pid | 22800
datid | 14187
datname | postgres
relid | 16388
phase | scanning heap
heap_blks_total | 93458
heap_blks_scanned | 80068
heap_blks_vacuumed | 80067
index_vacuum_count | 0
max_dead_tuples | 291
num_dead_tuples | 18
Progress reporting for CLUSTER and VACUUM FULL
CLUSTER and VACUUM FULL command use the same code paths for the relation rewrite, so you can check the progress of both commands using the view pg_stat_progress_cluster.
This view is available in PostgreSQL 12 and it shows the following information:
postgres=# d pg_stat_progress_cluster
View "pg_catalog.pg_stat_progress_cluster"
Column | Type | Collation | Nullable | Default
---------------------+---------+-----------+----------+---------
pid | integer | | |
datid | oid | | |
datname | name | | |
relid | oid | | |
command | text | | |
phase | text | | |
cluster_index_relid | bigint | | |
heap_tuples_scanned | bigint | | |
heap_tuples_written | bigint | | |
heap_blks_total | bigint | | |
heap_blks_scanned | bigint | | |
index_rebuild_count | bigint | | |
Operation Phases of CLUSTER Command
- Initializing
- Seq scanning heap
- Index scanning heap
- Sorting tuples
- Writing new heap
- Swapping relation files
- Rebuilding index
- Performing final cleanup
Example:
postgres=# create table test as select a,md5(a::text) as txt, now() as date from generate_series(1,3000000) a;
SELECT 3000000
postgres=# create index idx1 on test(a);
CREATE INDEX
postgres=# create index idx2 on test(txt);
CREATE INDEX
postgres=# create index idx3 on test(date);
CREATE INDEX
Now execute the CLUSTER table command and see the progress in pg_stat_progress_cluster.
Session 1:
postgres=# cluster verbose test using idx1;
[. . . waits for completion . . .]
Session 2:
postgres=# select * from pg_stat_progress_cluster;
pid | datid | datname | relid | command | phase | cluster_index_relid | heap_tuples_scanned | heap_tuples_written | heap_blks_total | heap_blks_scanned | index_rebuild_count
------+-------+----------+-------+---------+------------------+---------------------+---------------------+---------------------+-----------------+-------------------+---------------------
1273 | 13586 | postgres | 15672 | CLUSTER | rebuilding index | 15680 | 3000000 | 3000000 | 0 | 0 | 2
(1 row)
Progress Reporting for CREATE INDEX and REINDEX
Whenever the CREATE INDEX or REINDEX command is running, the pg_stat_progress_create_index view will contain one row for each backend that is currently creating indexes. The progress reporting feature allows to track also the CONCURRENTLY flavors of CREATE INDEX and REINDEX. The internal execution phases of CREATE INDEX and REINDEX commands are the same, so you can check the progress of both commands using the same view.
postgres=# d pg_stat_progress_create_index
View "pg_catalog.pg_stat_progress_create_index"
Column | Type | Collation | Nullable | Default
--------------------+---------+-----------+----------+---------
pid | integer | | |
datid | oid | | |
datname | name | | |
relid | oid | | |
phase | text | | |
lockers_total | bigint | | |
lockers_done | bigint | | |
current_locker_pid | bigint | | |
blocks_total | bigint | | |
blocks_done | bigint | | |
tuples_total | bigint | | |
tuples_done | bigint | | |
partitions_total | bigint | | |
partitions_done | bigint | | |
Operation Phases of CREATE INDEX / REINDEX
- Initializing
- Waiting for writers before build
- Building index
- Waiting for writers before validation
- Index validation: scanning index
- Index validation: sorting tuples
- Index validation: scanning table
- Waiting for old snapshots
- Waiting for readers before marking dead
- Waiting for readers before dropping
Example:
postgres=# create table test ( a int, b varchar(40), c timestamp );
CREATE TABLE
postgres=# insert into test ( a, b, c ) select aa, bb, cc from generate_series(1,10000000) aa, md5(aa::varchar) bb, now() cc;
INSERT 0 10000000
postgres=# CREATE INDEX idx ON test (b);
CREATE INDEX
Session 1:
postgres=# CREATE INDEX idx ON test (b);
[. . . waits for completion . . .]
Session 2:
postgres=# SELECT * FROM pg_stat_progress_create_index;
-[ RECORD 1 ]------+-------------------------------
pid | 19432
datid | 14187
datname | postgres
relid | 16405
index_relid | 0
command | CREATE INDEX
phase | building index: scanning table
lockers_total | 0
lockers_done | 0
current_locker_pid | 0
blocks_total | 93458
blocks_done | 46047
tuples_total | 0
tuples_done | 0
partitions_total | 0
partitions_done | 0
postgres=# SELECT * FROM pg_stat_progress_create_index;
-[ RECORD 1 ]------+---------------------------------------
pid | 19432
datid | 14187
datname | postgres
relid | 16405
index_relid | 0
command | CREATE INDEX
phase | building index: loading tuples in tree
lockers_total | 0
lockers_done | 0
current_locker_pid | 0
blocks_total | 0
blocks_done | 0
tuples_total | 10000000
tuples_done | 4346240
partitions_total | 0
partitions_done | 0
Conclusion
PostgreSQL version 9.6 onward has the ability to report the progress of certain commands during command execution. This is a really nice feature for DBA’s, Developers, and users to check the progress of long running commands. This reporting capability may extend for some other commands in future. You can read more about this new feature in the PostgreSQL documentation.