Add parallel columns for seq scan and index scan on pg_stat_all_tables and _indexes

From: Guillaume Lelarge <guillaume(at)lelarge(dot)info>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Add parallel columns for seq scan and index scan on pg_stat_all_tables and _indexes
Date: 2024-08-29 14:04:05
Message-ID: CAECtzeXXuMkw-RVGTWvHGOJsmFdsRY+jK0ndQa80sw46y2uvVQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

This patch was a bit discussed on [1], and with more details on [2]. It
introduces four new columns in pg_stat_all_tables:

* parallel_seq_scan
* last_parallel_seq_scan
* parallel_idx_scan
* last_parallel_idx_scan

and two new columns in pg_stat_all_indexes:

* parallel_idx_scan
* last_parallel_idx_scan

As Benoit said yesterday, the intent is to help administrators evaluate the
usage of parallel workers in their databases and help configuring
parallelization usage.

A test script (test.sql) is attached. You can execute it with "psql -Xef
test.sql your_database" (your_database should not contain a t1 table as it
will be dropped and recreated).

Here is its result, a bit commented:

DROP TABLE IF EXISTS t1;
DROP TABLE
CREATE TABLE t1 (id integer);
CREATE TABLE
INSERT INTO t1 SELECT generate_series(1, 10_000_000);
INSERT 0 10000000
VACUUM ANALYZE t1;
VACUUM
SELECT relname, seq_scan, last_seq_scan, parallel_seq_scan,
last_parallel_seq_scan FROM pg_stat_user_tables WHERE relname='t1'
-[ RECORD 1 ]----------+---
relname | t1
seq_scan | 0
last_seq_scan |
parallel_seq_scan | 0
last_parallel_seq_scan |

==> no scan at all, the table has just been created

SELECT * FROM t1 LIMIT 1;
id
----
1
(1 row)

SELECT pg_sleep(1);
SELECT relname, seq_scan, last_seq_scan, parallel_seq_scan,
last_parallel_seq_scan FROM pg_stat_user_tables WHERE relname='t1'
-[ RECORD 1 ]----------+------------------------------
relname | t1
seq_scan | 1
last_seq_scan | 2024-08-29 15:43:17.377182+02
parallel_seq_scan | 0
last_parallel_seq_scan |

==> one sequential scan, no parallelization

SELECT count(*) FROM t1;
count
----------
10000000
(1 row)

SELECT pg_sleep(1);
SELECT relname, seq_scan, last_seq_scan, parallel_seq_scan,
last_parallel_seq_scan FROM pg_stat_user_tables WHERE relname='t1'
-[ RECORD 1 ]----------+------------------------------
relname | t1
seq_scan | 4
last_seq_scan | 2024-08-29 15:43:18.504533+02
parallel_seq_scan | 3
last_parallel_seq_scan | 2024-08-29 15:43:18.504533+02

==> one parallel sequential scan
==> I use the default configuration, so parallel_leader_participation = on,
max_parallel_workers_per_gather = 2
==> meaning 3 parallel sequential scans (1 leader, two workers)
==> take note that seq_scan was also incremented... we didn't change the
previous behaviour for this column

CREATE INDEX ON t1(id);
CREATE INDEX
SELECT
indexrelname,idx_scan,last_idx_scan,parallel_idx_scan,last_parallel_idx_scan,idx_tup_read,idx_tup_fetch
FROM pg_stat_user_indexes WHERE relname='t1'
-[ RECORD 1 ]----------+----------
indexrelname | t1_id_idx
idx_scan | 0
last_idx_scan |
parallel_idx_scan | 0
last_parallel_idx_scan |
idx_tup_read | 0
idx_tup_fetch | 0

==> no scan at all, the index has just been created

SELECT * FROM t1 WHERE id=150000;
id
--------
150000
(1 row)

SELECT pg_sleep(1);
SELECT
indexrelname,idx_scan,last_idx_scan,parallel_idx_scan,last_parallel_idx_scan,idx_tup_read,idx_tup_fetch
FROM pg_stat_user_indexes WHERE relname='t1'
-[ RECORD 1 ]----------+------------------------------
indexrelname | t1_id_idx
idx_scan | 1
last_idx_scan | 2024-08-29 15:43:22.020853+02
parallel_idx_scan | 0
last_parallel_idx_scan |
idx_tup_read | 1
idx_tup_fetch | 0

==> one index scan, no parallelization

SELECT * FROM t1 WHERE id BETWEEN 100000 AND 400000;
SELECT pg_sleep(1);
pg_sleep
----------

(1 row)

SELECT
indexrelname,idx_scan,last_idx_scan,parallel_idx_scan,last_parallel_idx_scan,idx_tup_read,idx_tup_fetch
FROM pg_stat_user_indexes WHERE relname='t1'
-[ RECORD 1 ]----------+------------------------------
indexrelname | t1_id_idx
idx_scan | 2
last_idx_scan | 2024-08-29 15:43:23.136665+02
parallel_idx_scan | 0
last_parallel_idx_scan |
idx_tup_read | 300002
idx_tup_fetch | 0

==> another index scan, no parallelization

SELECT count(*) FROM t1 WHERE id BETWEEN 100000 AND 400000;
count
--------
300001
(1 row)

SELECT pg_sleep(1);
SELECT
indexrelname,idx_scan,last_idx_scan,parallel_idx_scan,last_parallel_idx_scan,idx_tup_read,idx_tup_fetch
FROM pg_stat_user_indexes WHERE relname='t1'
-[ RECORD 1 ]----------+-----------------------------
indexrelname | t1_id_idx
idx_scan | 5
last_idx_scan | 2024-08-29 15:43:24.16057+02
parallel_idx_scan | 3
last_parallel_idx_scan | 2024-08-29 15:43:24.16057+02
idx_tup_read | 600003
idx_tup_fetch | 0

==> one parallel index scan
==> same thing, 3 parallel index scans (1 leader, two workers)
==> also, take note that idx_scan was also incremented... we didn't change
the previous behaviour for this column

First time I had to add new columns to a statistics catalog. I'm actually
not sure that we were right to change pg_proc.dat manually. We'll probably
have to fix this.

Documentation is done, but maybe we should also add that seq_scan and
idx_scan also include parallel scan.

Yet to be done: tests. Once there's an agreement on this patch, we'll work
on the tests.

This has been a collective work with Benoit Lobréau, Jehan-Guillaume de
Rorthais, and Franck Boudehen.

Thanks.

Regards.

[1]
https://www.postgresql.org/message-id/flat/b4220d15-2e21-0e98-921b-b9892543cc93%40dalibo.com
[2]
https://www.postgresql.org/message-id/flat/d657df20-c4bf-63f6-e74c-cb85a81d0383%40dalibo.com

--
Guillaume.

Attachment Content-Type Size
test.sql application/sql 1.5 KB
0001-Add-parallel-columns-for-pg_stat_all_tables-indexes.patch text/x-patch 10.9 KB

Browse pgsql-hackers by date

  From Date Subject
Next Message Bertrand Drouvot 2024-08-29 14:15:47 Re: Add contrib/pg_logicalsnapinspect
Previous Message Amul Sul 2024-08-29 13:46:49 Re: pg_verifybackup: TAR format backup verification