Re: Add parallel columns for pg_stat_statements

From: Guillaume Lelarge <guillaume(at)lelarge(dot)info>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Add parallel columns for pg_stat_statements
Date: 2024-10-06 13:32:02
Message-ID: CAECtzeWcwRQ6xoZDC6VgOR1q6K=18ahhjU7vgWOzyy2tFwHHEw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Michael,

Le jeu. 3 oct. 2024 à 09:15, Michael Paquier <michael(at)paquier(dot)xyz> a écrit :

> On Thu, Aug 29, 2024 at 10:08:23PM +0200, Guillaume Lelarge wrote:
> > This patch was a bit discussed on [1], and with more details on [2]. It's
> > based on another patch sent in 2022 (see [3]). It introduces seven new
> > columns in pg_stat_statements:
> >
> > * parallelized_queries_planned, number of times the query has been
> planned
> > to be parallelized,
> > * parallelized_queries_launched, number of times the query has been
> > executed with parallelization,
>
> Comparing the numbers of workers planned and launched with the number
> of times a query has been called and planned should provide a rather
> good equivalent, no? I am not sure that these two are mandatory to
> have.
>
>
I'm not sure I follow. That would mean that every time a query is executed,
it always gets the same amount of workers. Which is not guaranteed to be
true.

I would agree, though, that parallelized_queries_launched is probably not
that interesting. I could get rid of it if you think it should go away.

> * parallelized_workers_planned, number of parallel workers planned for
> > this query,
> > * parallelized_workers_launched, number of parallel workers executed for
> > this query,
>
> Yep. Definitely OK with these two. There is an overlap with what
> Benoit has sent here when it comes to publish this data to the
> executor state:
>
> https://www.postgresql.org/message-id/783bc7f7-659a-42fa-99dd-ee0565644e25@dalibo.com
>
>
Well, I don't see this as an overlap. Rather more information.

> > * parallelized_nodes, number of parallelized nodes,
> > * parallelized_nodes_all_workers, number of parallelized nodes which had
> > all requested workers,
> >
> > * parallelized_nodes_no_worker, number of parallelized nodes which had
> no
> > requested workers.
>
> I can see why you want to register this extra data on a node-basis,
> but how does that help when it comes to tune the parallel GUCs? We
> cannot control them at node level and the launched/planned ratio
> offers an equivalent of that. Not exactly, but that's enough to get a
> picture if there is a draught.
>
>
On this, I would agree with you. They are not that particularly useful to
get better setting for parallel GUCs. I can drop them if you want.

> A test script (test2.sql) is attached. You can execute it with "psql -Xef
> > test2.sql your_database" (your_database should not contain a t1 table as
> it
> > will be dropped and recreated).
>
> Let's add proper regression tests instead, including
> oldextversions.sql as this bumps the version of the module. See for
> example the tests of 6fd5071909a2 that can force workers to spawn
> for BRIN and btree queries, validating some of the stats published
> here.
>

Did this on the v2 version of the patch (attached here).

Thanks for your review. If you want the parallelized_queries_launched
column and the parallelized_nodes_* columns dropped, I can do that on a v3
patch.

Regards.

--
Guillaume.

Attachment Content-Type Size
v2-0001-Add-parallel-columns-to-pg_stat_statements.patch text/x-patch 30.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dmitry Dolgov 2024-10-06 15:36:49 System views for versions reporting
Previous Message Andrew Dunstan 2024-10-06 13:12:59 Re: Should CSV parsing be stricter about mid-field quotes?