Re: Add parallel columns for pg_stat_statements

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Guillaume Lelarge <guillaume(at)lelarge(dot)info>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Add parallel columns for pg_stat_statements
Date: 2024-10-07 00:18:29
Message-ID: ZwMo1TcbWerEdqx6@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Oct 06, 2024 at 03:32:02PM +0200, Guillaume Lelarge wrote:
> I'm not sure I follow. That would mean that every time a query is executed,
> it always gets the same amount of workers. Which is not guaranteed to be
> true.
>
> I would agree, though, that parallelized_queries_launched is probably not
> that interesting. I could get rid of it if you think it should go away.

My point is that these stats are useful to know which action may have
to be taken when reaching a mean, and numbers in pg_stat_statements
offer hints that something is going wrong and that a closer lookup at
an EXPLAIN plan may be required, particularly if the total number of
workers planned and launched aggregated in the counters is unbalanced
across queries. If the planned/launched ratio is balanced across most
queries queries, a GUC adjustment may be OK. If the ratio is very
unbalanced in a lower set of queries, I'd also look at tweaking GUCs
instead like the per_gather. These counters give information that one
or the other may be required.

> Well, I don't see this as an overlap. Rather more information.

Later versions of Benoit's patch have been accumulating this data in
the executor state. v4 posted at [1] has the following diffs:
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -724,6 +724,9 @@ typedef struct EState
*/
List *es_insert_pending_result_relations;
List *es_insert_pending_modifytables;
+
+ int es_workers_launched;
+ int es_workers_planned;
} EState;

Your v2 posted on this thread has that:
@@ -707,6 +707,12 @@ typedef struct EState
struct EPQState *es_epq_active;

bool es_use_parallel_mode; /* can we use parallel workers? */
+ bool es_used_parallel_mode; /* was executed in parallel */
+ int es_parallelized_workers_launched;
+ int es_parallelized_workers_planned;
+ int es_parallelized_nodes; /* # of parallelized nodes */
+ int es_parallelized_nodes_all_workers; /* # of nodes with all workers launched */
+ int es_parallelized_nodes_no_worker; /* # of nodes with no workers launched */

es_parallelized_workers_launched and es_workers_launched are the same
thing in both.

> On this, I would agree with you. They are not that particularly useful to
> get better setting for parallel GUCs. I can drop them if you want.

Yep. I would remove them for now. This leads to more bloat.

> Did this on the v2 version of the patch (attached here).
>
> Thanks for your review. If you want the parallelized_queries_launched
> column and the parallelized_nodes_* columns dropped, I can do that on a v3
> patch.

I'd recommend to split that into more independent patches:
- Introduce the two counters in EState with the incrementations done
in nodeGatherMerge.c and nodeGather.c (mentioned that at [2], you may
want to coordinate with Benoit to avoid duplicating the work).
- Expand pg_stat_statements to use them for DMLs, SELECTs, well where
they matter.
- Look at expanding that for utilities that can do parallel jobs:
CREATE INDEX and VACUUM, but this has lower priority to me, and this
can reuse the same counters as the ones added by patch 2.

[1]: https://www.postgresql.org/message-id/6ecad3ad-835c-486c-9ebd-da87a9a97634@dalibo.com
[2]: https://www.postgresql.org/message-id/Zv46wTMjLTuu2t9J@paquier.xyz
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2024-10-07 00:41:36 Re: Add parallel columns for seq scan and index scan on pg_stat_all_tables and _indexes
Previous Message Thomas Munro 2024-10-06 23:50:56 Re: Converting tab-complete.c's else-if chain to a switch