From: | Guillaume Lelarge <guillaume(at)lelarge(dot)info> |
---|---|
To: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Add parallel columns for seq scan and index scan on pg_stat_all_tables and _indexes |
Date: | 2024-08-29 14:04:05 |
Message-ID: | CAECtzeXXuMkw-RVGTWvHGOJsmFdsRY+jK0ndQa80sw46y2uvVQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hello,
This patch was a bit discussed on [1], and with more details on [2]. It
introduces four new columns in pg_stat_all_tables:
* parallel_seq_scan
* last_parallel_seq_scan
* parallel_idx_scan
* last_parallel_idx_scan
and two new columns in pg_stat_all_indexes:
* parallel_idx_scan
* last_parallel_idx_scan
As Benoit said yesterday, the intent is to help administrators evaluate the
usage of parallel workers in their databases and help configuring
parallelization usage.
A test script (test.sql) is attached. You can execute it with "psql -Xef
test.sql your_database" (your_database should not contain a t1 table as it
will be dropped and recreated).
Here is its result, a bit commented:
DROP TABLE IF EXISTS t1;
DROP TABLE
CREATE TABLE t1 (id integer);
CREATE TABLE
INSERT INTO t1 SELECT generate_series(1, 10_000_000);
INSERT 0 10000000
VACUUM ANALYZE t1;
VACUUM
SELECT relname, seq_scan, last_seq_scan, parallel_seq_scan,
last_parallel_seq_scan FROM pg_stat_user_tables WHERE relname='t1'
-[ RECORD 1 ]----------+---
relname | t1
seq_scan | 0
last_seq_scan |
parallel_seq_scan | 0
last_parallel_seq_scan |
==> no scan at all, the table has just been created
SELECT * FROM t1 LIMIT 1;
id
----
1
(1 row)
SELECT pg_sleep(1);
SELECT relname, seq_scan, last_seq_scan, parallel_seq_scan,
last_parallel_seq_scan FROM pg_stat_user_tables WHERE relname='t1'
-[ RECORD 1 ]----------+------------------------------
relname | t1
seq_scan | 1
last_seq_scan | 2024-08-29 15:43:17.377182+02
parallel_seq_scan | 0
last_parallel_seq_scan |
==> one sequential scan, no parallelization
SELECT count(*) FROM t1;
count
----------
10000000
(1 row)
SELECT pg_sleep(1);
SELECT relname, seq_scan, last_seq_scan, parallel_seq_scan,
last_parallel_seq_scan FROM pg_stat_user_tables WHERE relname='t1'
-[ RECORD 1 ]----------+------------------------------
relname | t1
seq_scan | 4
last_seq_scan | 2024-08-29 15:43:18.504533+02
parallel_seq_scan | 3
last_parallel_seq_scan | 2024-08-29 15:43:18.504533+02
==> one parallel sequential scan
==> I use the default configuration, so parallel_leader_participation = on,
max_parallel_workers_per_gather = 2
==> meaning 3 parallel sequential scans (1 leader, two workers)
==> take note that seq_scan was also incremented... we didn't change the
previous behaviour for this column
CREATE INDEX ON t1(id);
CREATE INDEX
SELECT
indexrelname,idx_scan,last_idx_scan,parallel_idx_scan,last_parallel_idx_scan,idx_tup_read,idx_tup_fetch
FROM pg_stat_user_indexes WHERE relname='t1'
-[ RECORD 1 ]----------+----------
indexrelname | t1_id_idx
idx_scan | 0
last_idx_scan |
parallel_idx_scan | 0
last_parallel_idx_scan |
idx_tup_read | 0
idx_tup_fetch | 0
==> no scan at all, the index has just been created
SELECT * FROM t1 WHERE id=150000;
id
--------
150000
(1 row)
SELECT pg_sleep(1);
SELECT
indexrelname,idx_scan,last_idx_scan,parallel_idx_scan,last_parallel_idx_scan,idx_tup_read,idx_tup_fetch
FROM pg_stat_user_indexes WHERE relname='t1'
-[ RECORD 1 ]----------+------------------------------
indexrelname | t1_id_idx
idx_scan | 1
last_idx_scan | 2024-08-29 15:43:22.020853+02
parallel_idx_scan | 0
last_parallel_idx_scan |
idx_tup_read | 1
idx_tup_fetch | 0
==> one index scan, no parallelization
SELECT * FROM t1 WHERE id BETWEEN 100000 AND 400000;
SELECT pg_sleep(1);
pg_sleep
----------
(1 row)
SELECT
indexrelname,idx_scan,last_idx_scan,parallel_idx_scan,last_parallel_idx_scan,idx_tup_read,idx_tup_fetch
FROM pg_stat_user_indexes WHERE relname='t1'
-[ RECORD 1 ]----------+------------------------------
indexrelname | t1_id_idx
idx_scan | 2
last_idx_scan | 2024-08-29 15:43:23.136665+02
parallel_idx_scan | 0
last_parallel_idx_scan |
idx_tup_read | 300002
idx_tup_fetch | 0
==> another index scan, no parallelization
SELECT count(*) FROM t1 WHERE id BETWEEN 100000 AND 400000;
count
--------
300001
(1 row)
SELECT pg_sleep(1);
SELECT
indexrelname,idx_scan,last_idx_scan,parallel_idx_scan,last_parallel_idx_scan,idx_tup_read,idx_tup_fetch
FROM pg_stat_user_indexes WHERE relname='t1'
-[ RECORD 1 ]----------+-----------------------------
indexrelname | t1_id_idx
idx_scan | 5
last_idx_scan | 2024-08-29 15:43:24.16057+02
parallel_idx_scan | 3
last_parallel_idx_scan | 2024-08-29 15:43:24.16057+02
idx_tup_read | 600003
idx_tup_fetch | 0
==> one parallel index scan
==> same thing, 3 parallel index scans (1 leader, two workers)
==> also, take note that idx_scan was also incremented... we didn't change
the previous behaviour for this column
First time I had to add new columns to a statistics catalog. I'm actually
not sure that we were right to change pg_proc.dat manually. We'll probably
have to fix this.
Documentation is done, but maybe we should also add that seq_scan and
idx_scan also include parallel scan.
Yet to be done: tests. Once there's an agreement on this patch, we'll work
on the tests.
This has been a collective work with Benoit Lobréau, Jehan-Guillaume de
Rorthais, and Franck Boudehen.
Thanks.
Regards.
[1]
https://www.postgresql.org/message-id/flat/b4220d15-2e21-0e98-921b-b9892543cc93%40dalibo.com
[2]
https://www.postgresql.org/message-id/flat/d657df20-c4bf-63f6-e74c-cb85a81d0383%40dalibo.com
--
Guillaume.
Attachment | Content-Type | Size |
---|---|---|
test.sql | application/sql | 1.5 KB |
0001-Add-parallel-columns-for-pg_stat_all_tables-indexes.patch | text/x-patch | 10.9 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Bertrand Drouvot | 2024-08-29 14:15:47 | Re: Add contrib/pg_logicalsnapinspect |
Previous Message | Amul Sul | 2024-08-29 13:46:49 | Re: pg_verifybackup: TAR format backup verification |