Quick Links

Add parallel columns for pg_stat_statements

From:	Guillaume Lelarge <guillaume(at)lelarge(dot)info>
To:	PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Add parallel columns for pg_stat_statements
Date:	2024-08-29 20:08:23
Message-ID:	CAECtzeWtTGOK0UgKXdDGpfTVSa5bd_VbUt6K6xn8P7X+_dZqKw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hello,

This patch was a bit discussed on [1], and with more details on [2]. It's
based on another patch sent in 2022 (see [3]). It introduces seven new
columns in pg_stat_statements:

* parallelized_queries_planned, number of times the query has been planned
to be parallelized,
* parallelized_queries_launched, number of times the query has been
executed with parallelization,
* parallelized_workers_planned, number of parallel workers planned for
this query,
* parallelized_workers_launched, number of parallel workers executed for
this query,
* parallelized_nodes, number of parallelized nodes,
* parallelized_nodes_all_workers, number of parallelized nodes which had
all requested workers,
* parallelized_nodes_no_worker, number of parallelized nodes which had no
requested workers.

As Benoit said yesterday, the intent is to help administrators evaluate the
usage of parallel workers in their databases and help configuring
parallelization usage.

A test script (test2.sql) is attached. You can execute it with "psql -Xef
test2.sql your_database" (your_database should not contain a t1 table as it
will be dropped and recreated).

Here is its result, a bit commented:

CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
CREATE EXTENSION
SELECT pg_stat_statements_reset();
pg_stat_statements_reset
-------------------------------
2024-08-29 18:00:35.314557+02
(1 row)

DROP TABLE IF EXISTS t1;
DROP TABLE
CREATE TABLE t1 (id integer);
CREATE TABLE
INSERT INTO t1 SELECT generate_series(1, 10_000_000);
INSERT 0 10000000
VACUUM ANALYZE t1;
VACUUM
SELECT query,
parallelized_queries_planned, parallelized_queries_launched,
parallelized_workers_planned, parallelized_workers_launched,
parallelized_nodes, parallelized_nodes_all_workers,
parallelized_nodes_no_worker
FROM pg_stat_statements
WHERE query LIKE 'SELECT%t1%'
(0 rows)

SELECT * FROM t1 LIMIT 1;
id
----
1
(1 row)

==> no parallelization

SELECT count(*) FROM t1;
count
----------
10000000
(1 row)

SELECT pg_sleep(1);
SELECT query,
parallelized_queries_planned, parallelized_queries_launched,
parallelized_workers_planned, parallelized_workers_launched,
parallelized_nodes, parallelized_nodes_all_workers,
parallelized_nodes_no_worker
FROM pg_stat_statements
WHERE query LIKE 'SELECT%t1%'
-[ RECORD 1 ]------------------+--------------------------
query | SELECT count(*) FROM t1
parallelized_queries_planned | 1
parallelized_queries_launched | 1
parallelized_workers_planned | 2
parallelized_workers_launched | 2
parallelized_nodes | 1
parallelized_nodes_all_workers | 1
parallelized_nodes_no_worker | 0
-[ RECORD 2 ]------------------+--------------------------
query | SELECT * FROM t1 LIMIT $1
parallelized_queries_planned | 0
parallelized_queries_launched | 0
parallelized_workers_planned | 0
parallelized_workers_launched | 0
parallelized_nodes | 0
parallelized_nodes_all_workers | 0
parallelized_nodes_no_worker | 0

==> one parallelized query
==> I have the default configuration, so 2 for
max_parallel_worker_per_gather
==> hence two workers, with one node with all workers

SET max_parallel_workers_per_gather TO 5;
SET
SELECT count(*) FROM t1;
count
----------
10000000
(1 row)

SELECT pg_sleep(1);
SELECT query,
parallelized_queries_planned, parallelized_queries_launched,
parallelized_workers_planned, parallelized_workers_launched,
parallelized_nodes, parallelized_nodes_all_workers,
parallelized_nodes_no_worker
FROM pg_stat_statements
WHERE query LIKE 'SELECT%t1%'
-[ RECORD 1 ]------------------+--------------------------
query | SELECT count(*) FROM t1
parallelized_queries_planned | 2
parallelized_queries_launched | 2
parallelized_workers_planned | 6
parallelized_workers_launched | 6
parallelized_nodes | 2
parallelized_nodes_all_workers | 2
parallelized_nodes_no_worker | 0
-[ RECORD 2 ]------------------+--------------------------
query | SELECT * FROM t1 LIMIT $1
parallelized_queries_planned | 0
parallelized_queries_launched | 0
parallelized_workers_planned | 0
parallelized_workers_launched | 0
parallelized_nodes | 0
parallelized_nodes_all_workers | 0
parallelized_nodes_no_worker | 0

==> another parallelized query
==> with 5 as max_parallel_workers_per_gather, but only 4 workers to use
==> hence four workers, with one node with all workers

The biggest issue with this patch is that it's unable to know workers for
maintenance queries (CREATE INDEX for BTree and VACUUM).

Documentation is done, tests are missing. Once there's an agreement on this
patch, we'll work on the tests.

This has been a collective work with Benoit Lobréau, Jehan-Guillaume de
Rorthais, and Franck Boudehen.

Thanks.

Regards.

[1]
https://www.postgresql.org/message-id/flat/b4220d15-2e21-0e98-921b-b9892543cc93%40dalibo.com
[2]
https://www.postgresql.org/message-id/flat/d657df20-c4bf-63f6-e74c-cb85a81d0383%40dalibo.com
[3]
https://www.postgresql.org/message-id/flat/6acbe570-068e-bd8e-95d5-00c737b865e8%40gmail.com

--
Guillaume.

Attachment	Content-Type	Size
0001-Add-parallel-columns-to-pg_stat_statements.patch	text/x-patch	19.4 KB
test2.sql	application/sql	1.5 KB

Responses

Re: Add parallel columns for pg_stat_statements at 2024-10-03 07:14:58 from Michael Paquier

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Jacob Champion	2024-08-29 20:44:01	Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible
Previous Message	Jeff Davis	2024-08-29 19:29:43	Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM