Re: Add parallel columns for seq scan and index scan on pg_stat_all_tables and _indexes

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>
Cc: Guillaume Lelarge <guillaume(at)lelarge(dot)info>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Add parallel columns for seq scan and index scan on pg_stat_all_tables and _indexes
Date: 2024-10-07 00:41:36
Message-ID: ZwMuQHt42IyMDyxD@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Oct 07, 2024 at 12:43:18AM +0300, Alena Rybakina wrote:
> Maybe I'm not aware of the whole context of the thread and maybe my
> questions will seem a bit stupid, but honestly
> it's not entirely clear to me how this statistics will help to adjust the
> number of parallel workers.
> We may have situations when during overestimation of the cardinality during
> query optimization a several number of parallel workers were unjustifiably
> generated and vice versa -
> due to a high workload only a few number of workers were generated.
> How do we identify such cases so as not to increase or decrease the number
> of parallel workers when it is not necessary?

Well. For spiky workloads, only these numbers are not going to help.
If you can map them with the number of times a query related to these
tables has been called, something that pg_stat_statements would be
able to show more information about.

FWIW, I have doubts that these numbers attached to this portion of the
system are always useful. For OLTP workloads, parallel workers would
unlikely be spawned because even with JOINs we won't work with a high
number of tuples that require them. This could be interested with
analytics, however complex query sequences mean that we'd still need
to look at all the plans involving the relations where there is an
unbalance of planned/spawned workers, because these can usually
involve quite a few gather nodes. At the end of the day, it seems to
me that we would still need data that involves statements to track
down specific plans that are starving. If your application does not
have that many statements, looking at individial plans is OK, but if
you have hundreds of them to dig into, this is time-consuming and
stats at table/index level don't offer data in terms of stuff that
stands out and needs adjustments.

And this is without the argument of bloating more the stats entries
for each table, even if it matters less now that these stats are in
shmem lately.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2024-10-07 01:07:10 Re: Function for listing pg_wal/summaries directory
Previous Message Michael Paquier 2024-10-07 00:18:29 Re: Add parallel columns for pg_stat_statements