Progress Reporting Enhancements in PostgreSQL 12

Severalnines

In PostgreSQL, many DDL commands can take a very long time to execute. PostgreSQL has the ability to report the progress of DDL commands during command execution. Since PostgreSQL 9.6, it has been possible to monitor the progress of running manual VACUUM and autovacuum using a dedicated system catalog (called pg_stat_progress_vacuum).

PostgreSQL 12 has added support for monitoring the progress of a few more commands like CLUSTER, VACUUM FULL,CREATE INDEX, and REINDEX.

Currently, the progress reporting facility is available only for command as below.

  • VACUUM command
  • CLUSTER command
  • VACUUM FULL command
  • CREATE INDEX command
  • REINDEX command

Why is the Progress Reporting Feature in PostgreSQL Important?

This feature is very important for operators when doing some long-running operations, because it is possible to not blindly wait for an operation to finish. 

This is a very useful feature to get some insight like:

  • How much total work there is
  • How much work already done 

Progress reporting feature is also useful when doing performance workload analysis, this is also proving to be useful in evaluating VACUUM job processing for tuning system-level parameters or relation level once depending on load pattern. 

Supported Commands and system catalog

DDL Command

System Catalog

Supported PostgreSQL Version

VACUUM

pg_stat_progress_vacuum

9.6

VACUUM FULL

pg_stat_progress_cluster

12

CLUSTER

pg_stat_progress_cluster

12

CREATE INDEX

pg_stat_progress_create_index

12

REINDEX

pg_stat_progress_create_index

12

How to Monitor the Progress of the VACUUM Command

Whenever the VACUUM command is running, the pg_stat_progress_vacuum view will contain one row for each backend (including autovacuum worker processes) that is currently vacuuming. The view to check the progress of running VACUUM and VACCUM FULL commands is different because the operation phases of both commands are different.

Operation Phases of the VACUUM Command

  1. Initializing
  2. Scanning heap
  3. Vacuuming indexes
  4. Vacuuming heap
  5. Cleaning up indexes
  6. Truncating heap
  7. Performing final cleanup

This view is available in PostgreSQL 12 which gives the following information:

postgres=# \d pg_stat_progress_vacuum ;

           View "pg_catalog.pg_stat_progress_vacuum"

       Column       |  Type   | Collation | Nullable | Default

--------------------+---------+-----------+----------+---------

 pid                | integer |           |          |

 datid              | oid     |           |          |

 datname            | name    |           |          |

 relid              | oid     |           |          |

 phase              | text    |           |          |

 heap_blks_total    | bigint  |           |          |

 heap_blks_scanned  | bigint  |           |          |

 heap_blks_vacuumed | bigint  |           |          |

 index_vacuum_count | bigint  |           |          |

 max_dead_tuples    | bigint  |           |          |

 num_dead_tuples    | bigint  |           |          |

Example:

postgres=# create table test ( a int, b varchar(40), c timestamp );

CREATE TABLE

postgres=# insert into test ( a, b, c ) select aa, bb, cc from generate_series(1,10000000) aa, md5(aa::varchar) bb, now() cc;

INSERT 0 10000000

​postgres=# DELETE FROM test WHERE mod(a,6) = 0;

DELETE 1666666

Session 1:

postgres=# vacuum verbose test;

[. . . waits for completion . . .]

Session 2:

postgres=# select * from pg_stat_progress_vacuum;

-[ RECORD 1 ]------+--------------

pid                | 22800

datid              | 14187

datname            | postgres

relid              | 16388

phase              | scanning heap

heap_blks_total    | 93458

heap_blks_scanned  | 80068

heap_blks_vacuumed | 80067

index_vacuum_count | 0

max_dead_tuples    | 291

num_dead_tuples    | 18

Progress reporting for CLUSTER and VACUUM FULL

CLUSTER and VACUUM FULL command use the same code paths for the relation rewrite, so you can check the progress of both commands using the view pg_stat_progress_cluster.

This view is available in PostgreSQL 12 and it shows the following information: 

postgres=# \d pg_stat_progress_cluster

           View "pg_catalog.pg_stat_progress_cluster"

       Column        |  Type   | Collation | Nullable | Default

---------------------+---------+-----------+----------+---------

 pid                 | integer |           |          | 

 datid               | oid     |           |          | 

 datname             | name    |           |          | 

 relid               | oid     |           |          | 

 command             | text    |           |          | 

 phase               | text    |           |          | 

 cluster_index_relid | bigint  |           |          | 

 heap_tuples_scanned | bigint  |           |          | 

 heap_tuples_written | bigint  |           |          | 

 heap_blks_total     | bigint  |           |          | 

 heap_blks_scanned   | bigint  |           |          | 

 index_rebuild_count | bigint  |           |          | 

Operation Phases of CLUSTER Command

  1. Initializing
  2. Seq scanning heap
  3. Index scanning heap
  4. Sorting tuples
  5. Writing new heap
  6. Swapping relation files
  7. Rebuilding index
  8. Performing final cleanup

Example:

postgres=# create table test as select a,md5(a::text) as txt, now() as date from generate_series(1,3000000) a;

SELECT 3000000

postgres=# create index idx1 on test(a);

CREATE INDEX

postgres=# create index idx2 on test(txt);

CREATE INDEX

postgres=# create index idx3 on test(date);

CREATE INDEX

Now execute the CLUSTER table command and see the progress in pg_stat_progress_cluster. 

Session 1:

postgres=# cluster verbose test using idx1;

[. . . waits for completion . . .]

Session 2:

postgres=# select * from pg_stat_progress_cluster;

 pid  | datid | datname  | relid | command |      phase       | cluster_index_relid | heap_tuples_scanned | heap_tuples_written | heap_blks_total | heap_blks_scanned | index_rebuild_count 

------+-------+----------+-------+---------+------------------+---------------------+---------------------+---------------------+-----------------+-------------------+---------------------

 1273 | 13586 | postgres | 15672 | CLUSTER | rebuilding index |               15680 |             3000000 |             3000000 |               0 |                 0 |                   2

(1 row)

Progress Reporting for CREATE INDEX and REINDEX

Whenever the CREATE INDEX or REINDEX command is running, the pg_stat_progress_create_index view will contain one row for each backend that is currently creating indexes. The progress reporting feature allows to track also the CONCURRENTLY flavors of CREATE INDEX and REINDEX. The internal execution phases of CREATE INDEX and REINDEX commands are the same, so you can check the progress of both commands using the same view.

postgres=# \d pg_stat_progress_create_index 

        View "pg_catalog.pg_stat_progress_create_index"

       Column       |  Type   | Collation | Nullable | Default

--------------------+---------+-----------+----------+---------

 pid                | integer |           |          | 

 datid              | oid     |           |          | 

 datname            | name    |           |          | 

 relid              | oid     |           |          | 

 phase              | text    |           |          | 

 lockers_total      | bigint  |           |          | 

 lockers_done       | bigint  |           |          | 

 current_locker_pid | bigint  |           |          | 

 blocks_total       | bigint  |           |          | 

 blocks_done        | bigint  |           |          | 

 tuples_total       | bigint  |           |          | 

 tuples_done        | bigint  |           |          | 

 partitions_total   | bigint  |           |          | 

 partitions_done    | bigint  |           |          | 

Operation Phases of CREATE INDEX / REINDEX

  1. Initializing
  2. Waiting for writers before build
  3. Building index
  4. Waiting for writers before validation
  5. Index validation: scanning index
  6. Index validation:  sorting tuples
  7. Index validation: scanning table
  8. Waiting for old snapshots
  9. Waiting for readers before marking dead
  10. Waiting for readers before dropping

Example:

postgres=# create table test ( a int, b varchar(40), c timestamp );

CREATE TABLE



postgres=# insert into test ( a, b, c ) select aa, bb, cc from generate_series(1,10000000) aa, md5(aa::varchar) bb, now() cc;

INSERT 0 10000000



postgres=# CREATE INDEX idx ON test (b);

CREATE INDEX

Session 1:

postgres=# CREATE INDEX idx ON test (b);

[. . . waits for completion . . .]

Session 2:

postgres=# SELECT * FROM pg_stat_progress_create_index;

-[ RECORD 1 ]------+-------------------------------

pid                | 19432

datid              | 14187

datname            | postgres

relid              | 16405

index_relid        | 0

command            | CREATE INDEX

phase              | building index: scanning table

lockers_total      | 0

lockers_done       | 0

current_locker_pid | 0

blocks_total       | 93458

blocks_done        | 46047

tuples_total       | 0

tuples_done        | 0

partitions_total   | 0

partitions_done    | 0



postgres=# SELECT * FROM pg_stat_progress_create_index;

-[ RECORD 1 ]------+---------------------------------------

pid                | 19432

datid              | 14187

datname            | postgres

relid              | 16405

index_relid        | 0

command            | CREATE INDEX

phase              | building index: loading tuples in tree

lockers_total      | 0

lockers_done       | 0

current_locker_pid | 0

blocks_total       | 0

blocks_done        | 0

tuples_total       | 10000000

tuples_done        | 4346240

partitions_total   | 0

partitions_done    | 0

Conclusion

PostgreSQL version 9.6 onward has the ability to report the progress of certain commands during command execution. This is a really nice feature for DBA’s, Developers, and users to check the progress of long running commands. This reporting capability may extend for some other commands in future. You can read more about this new feature in the PostgreSQL documentation.

ClusterControl
The only management system you’ll ever need to take control of your open source database infrastructure.