blog

Progress Reporting Enhancements in PostgreSQL 12

Severalnines AC

Published: May 15, 2020
Last Updated: May 4, 2022

In PostgreSQL, many DDL commands can take a very long time to execute. PostgreSQL has the ability to report the progress of DDL commands during command execution. Since PostgreSQL 9.6, it has been possible to monitor the progress of running manual VACUUM and autovacuum using a dedicated system catalog (called pg_stat_progress_vacuum).

PostgreSQL 12 has added support for monitoring the progress of a few more commands like CLUSTER, VACUUM FULL,CREATE INDEX, and REINDEX.

Currently, the progress reporting facility is available only for command as below.

VACUUM command
CLUSTER command
VACUUM FULL command
CREATE INDEX command
REINDEX command

Why is the Progress Reporting Feature in PostgreSQL Important?

This feature is very important for operators when doing some long-running operations, because it is possible to not blindly wait for an operation to finish.

This is a very useful feature to get some insight like:

How much total work there is
How much work already done

Progress reporting feature is also useful when doing performance workload analysis, this is also proving to be useful in evaluating VACUUM job processing for tuning system-level parameters or relation level once depending on load pattern.

Supported Commands and system catalog

DDL Command	System Catalog	Supported PostgreSQL Version
VACUUM	pg_stat_progress_vacuum	9.6
VACUUM FULL	pg_stat_progress_cluster	12
CLUSTER	pg_stat_progress_cluster	12
CREATE INDEX	pg_stat_progress_create_index	12
REINDEX	pg_stat_progress_create_index	12

How to Monitor the Progress of the VACUUM Command

Whenever the VACUUM command is running, the pg_stat_progress_vacuum view will contain one row for each backend (including autovacuum worker processes) that is currently vacuuming. The view to check the progress of running VACUUM and VACCUM FULL commands is different because the operation phases of both commands are different.

Operation Phases of the VACUUM Command

Initializing
Scanning heap
Vacuuming indexes
Vacuuming heap
Cleaning up indexes
Truncating heap
Performing final cleanup

This view is available in PostgreSQL 12 which gives the following information:

postgres=# d pg_stat_progress_vacuum ;

           View "pg_catalog.pg_stat_progress_vacuum"

       Column       |  Type   | Collation | Nullable | Default

--------------------+---------+-----------+----------+---------

 pid                | integer |           |          |

 datid              | oid     |           |          |

 datname            | name    |           |          |

 relid              | oid     |           |          |

 phase              | text    |           |          |

 heap_blks_total    | bigint  |           |          |

 heap_blks_scanned  | bigint  |           |          |

 heap_blks_vacuumed | bigint  |           |          |

 index_vacuum_count | bigint  |           |          |

 max_dead_tuples    | bigint  |           |          |

 num_dead_tuples    | bigint  |           |          |

Example:

postgres=# create table test ( a int, b varchar(40), c timestamp );

CREATE TABLE

postgres=# insert into test ( a, b, c ) select aa, bb, cc from generate_series(1,10000000) aa, md5(aa::varchar) bb, now() cc;

INSERT 0 10000000

postgres=# DELETE FROM test WHERE mod(a,6) = 0;

DELETE 1666666

Session 1:

postgres=# vacuum verbose test;

[. . . waits for completion . . .]

Session 2:

postgres=# select * from pg_stat_progress_vacuum;

-[ RECORD 1 ]------+--------------

pid                | 22800

datid              | 14187

datname            | postgres

relid              | 16388

phase              | scanning heap

heap_blks_total    | 93458

heap_blks_scanned  | 80068

heap_blks_vacuumed | 80067

index_vacuum_count | 0

max_dead_tuples    | 291

num_dead_tuples    | 18

Progress reporting for CLUSTER and VACUUM FULL

CLUSTER and VACUUM FULL command use the same code paths for the relation rewrite, so you can check the progress of both commands using the view pg_stat_progress_cluster.

This view is available in PostgreSQL 12 and it shows the following information:

postgres=# d pg_stat_progress_cluster

           View "pg_catalog.pg_stat_progress_cluster"

       Column        |  Type   | Collation | Nullable | Default

---------------------+---------+-----------+----------+---------

 pid                 | integer |           |          | 

 datid               | oid     |           |          | 

 datname             | name    |           |          | 

 relid               | oid     |           |          | 

 command             | text    |           |          | 

 phase               | text    |           |          | 

 cluster_index_relid | bigint  |           |          | 

 heap_tuples_scanned | bigint  |           |          | 

 heap_tuples_written | bigint  |           |          | 

 heap_blks_total     | bigint  |           |          | 

 heap_blks_scanned   | bigint  |           |          | 

 index_rebuild_count | bigint  |           |          |

Operation Phases of CLUSTER Command

Initializing
Seq scanning heap
Index scanning heap
Sorting tuples
Writing new heap
Swapping relation files
Rebuilding index
Performing final cleanup

Example:

postgres=# create table test as select a,md5(a::text) as txt, now() as date from generate_series(1,3000000) a;

SELECT 3000000

postgres=# create index idx1 on test(a);

CREATE INDEX

postgres=# create index idx2 on test(txt);

CREATE INDEX

postgres=# create index idx3 on test(date);

CREATE INDEX

Now execute the CLUSTER table command and see the progress in pg_stat_progress_cluster.

Session 1:

postgres=# cluster verbose test using idx1;

[. . . waits for completion . . .]

Session 2:

postgres=# select * from pg_stat_progress_cluster;

 pid  | datid | datname  | relid | command |      phase       | cluster_index_relid | heap_tuples_scanned | heap_tuples_written | heap_blks_total | heap_blks_scanned | index_rebuild_count 

------+-------+----------+-------+---------+------------------+---------------------+---------------------+---------------------+-----------------+-------------------+---------------------

 1273 | 13586 | postgres | 15672 | CLUSTER | rebuilding index |               15680 |             3000000 |             3000000 |               0 |                 0 |                   2

(1 row)

Progress Reporting for CREATE INDEX and REINDEX

Whenever the CREATE INDEX or REINDEX command is running, the pg_stat_progress_create_index view will contain one row for each backend that is currently creating indexes. The progress reporting feature allows to track also the CONCURRENTLY flavors of CREATE INDEX and REINDEX. The internal execution phases of CREATE INDEX and REINDEX commands are the same, so you can check the progress of both commands using the same view.

postgres=# d pg_stat_progress_create_index 

        View "pg_catalog.pg_stat_progress_create_index"

       Column       |  Type   | Collation | Nullable | Default

--------------------+---------+-----------+----------+---------

 pid                | integer |           |          | 

 datid              | oid     |           |          | 

 datname            | name    |           |          | 

 relid              | oid     |           |          | 

 phase              | text    |           |          | 

 lockers_total      | bigint  |           |          | 

 lockers_done       | bigint  |           |          | 

 current_locker_pid | bigint  |           |          | 

 blocks_total       | bigint  |           |          | 

 blocks_done        | bigint  |           |          | 

 tuples_total       | bigint  |           |          | 

 tuples_done        | bigint  |           |          | 

 partitions_total   | bigint  |           |          | 

 partitions_done    | bigint  |           |          |

Operation Phases of CREATE INDEX / REINDEX

Initializing
Waiting for writers before build
Building index
Waiting for writers before validation
Index validation: scanning index
Index validation: sorting tuples
Index validation: scanning table
Waiting for old snapshots
Waiting for readers before marking dead
Waiting for readers before dropping

Example:

postgres=# create table test ( a int, b varchar(40), c timestamp );

CREATE TABLE



postgres=# insert into test ( a, b, c ) select aa, bb, cc from generate_series(1,10000000) aa, md5(aa::varchar) bb, now() cc;

INSERT 0 10000000



postgres=# CREATE INDEX idx ON test (b);

CREATE INDEX

Session 1:

postgres=# CREATE INDEX idx ON test (b);

[. . . waits for completion . . .]

Session 2:

postgres=# SELECT * FROM pg_stat_progress_create_index;

-[ RECORD 1 ]------+-------------------------------

pid                | 19432

datid              | 14187

datname            | postgres

relid              | 16405

index_relid        | 0

command            | CREATE INDEX

phase              | building index: scanning table

lockers_total      | 0

lockers_done       | 0

current_locker_pid | 0

blocks_total       | 93458

blocks_done        | 46047

tuples_total       | 0

tuples_done        | 0

partitions_total   | 0

partitions_done    | 0



postgres=# SELECT * FROM pg_stat_progress_create_index;

-[ RECORD 1 ]------+---------------------------------------

pid                | 19432

datid              | 14187

datname            | postgres

relid              | 16405

index_relid        | 0

command            | CREATE INDEX

phase              | building index: loading tuples in tree

lockers_total      | 0

lockers_done       | 0

current_locker_pid | 0

blocks_total       | 0

blocks_done        | 0

tuples_total       | 10000000

tuples_done        | 4346240

partitions_total   | 0

partitions_done    | 0

Conclusion

PostgreSQL version 9.6 onward has the ability to report the progress of certain commands during command execution. This is a really nice feature for DBA’s, Developers, and users to check the progress of long running commands. This reporting capability may extend for some other commands in future. You can read more about this new feature in the PostgreSQL documentation.

How ClusterControl Saved Christmas – Part 6

Operational considerations for ClickHouse in hybrid environments

How ClusterControl Saved Christmas – Part 5

How ClusterControl Saved Christmas – Part 4

Why is the Progress Reporting Feature in PostgreSQL Important?

Supported Commands and system catalog

How to Monitor the Progress of the VACUUM Command

Operation Phases of the VACUUM Command

Example:

Session 1:

Session 2:

Progress reporting for CLUSTER and VACUUM FULL

Operation Phases of CLUSTER Command

Example:

Session 1:

Session 2:

Progress Reporting for CREATE INDEX and REINDEX

Operation Phases of CREATE INDEX / REINDEX

Example:

Session 1:

Session 2:

Conclusion

Recommended

How ClusterControl Saved Christmas – Part 6

Operational considerations for ClickHouse in hybrid environments

How ClusterControl Saved Christmas – Part 5

How ClusterControl Saved Christmas – Part 4

Subscribe below to be notified of fresh posts