Re: Parallel workers stats in pg_stat_database

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Benoit Lobréau <benoit(dot)lobreau(at)dalibo(dot)com>
Cc: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel workers stats in pg_stat_database
Date: 2024-10-03 06:33:37
Message-ID: Zv46wTMjLTuu2t9J@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Oct 02, 2024 at 11:12:37AM +0200, Benoit Lobréau wrote:
> My collegues and I had a discussion about what could be done to improve
> parallelism observability in PostgreSQL [0]. We thought about several
> places to do it for several use cases.
>
> Guillaume Lelarge worked on pg_stat_statements [1].

Thanks, missed that. I will post a reply there. There is a good
overlap with everything you are doing here, because each one of you
wishes to track more data to the executor state and push it to
different part of the system, system view or just an extension.

Tracking the number of workers launched and planned in the executor
state is the strict minimum for a lot of these things, as far as I can
see. Once the nodes are able to push this data, then extensions can
feed on it the way they want. So that's a good idea on its own, and
covers two of the counters posted here:
https://www.postgresql.org/message-id/CAECtzeWtTGOK0UgKXdDGpfTVSa5bd_VbUt6K6xn8P7X%2B_dZqKw%40mail.gmail.com

Could you split the patch based on that? I'd recommend to move
es_workers_launched and es_workers_planned closer to the top, say
es_total_processed, and document what these counters are here for.

After that comes the problem of where to push this data..

> Lastly the number would be more precise/easier to make sense of, since
> pg_stat_statement has a limited size.

Upper bound that can be configured.

When looking for query-level patterns or specific SET tuning, using
query-level data speaks more than this data pushed at database level.
TBH, I am +-0 about pushing this data to pg_stat_database so as we
would be able to tune database-level GUCs. That does not help with
SET commands tweaking the number of workers to use. Well, perhaps few
rely on SET and most rely on the system-level GUCs in their
applications, meaning that I'm wrong, making your point about
publishing this data at database-level better, but I'm not really
sure. If others have an opinion, feel free.

Anyway, what I am sure of is that publishing the same set of data
everywhere leads to bloat, and I'd rather avoid that. Aggregating
that from the queries also to get an impression of the whole database
offers an equivalent of what would be stored in pg_stat_database
assuming that the load is steady. Your point about pg_stat_statements
not being set is also true, even if some cloud vendors enable it by
default.

Table/index-level data can be really interesting because we can
cross-check what's happening for more complex queries if there are
many gather nodes with complex JOINs.

Utilities (vacuum, btree, brin) are straight-forward and best at query
level, making pg_stat_statements their best match. And there is no
need for four counters if pushed at this level while two are able to
do the job as utility and non-utility statements are separated
depending on their PlannedStmt leading to separate entries in PGSS.
--
Michael

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2024-10-03 07:14:58 Re: Add parallel columns for pg_stat_statements
Previous Message Amit Langote 2024-10-03 06:22:24 Re: general purpose array_sort