Re: Add parallel columns for seq scan and index scan on pg_stat_all_tables and _indexes

From: Guillaume Lelarge <guillaume(at)lelarge(dot)info>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Add parallel columns for seq scan and index scan on pg_stat_all_tables and _indexes
Date: 2024-10-07 06:34:58
Message-ID: CAECtzeWb15z6UJcvRx14Pp0Z5YC5KYq2rGd__W_Y=QH-QFF4ug@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Le lun. 7 oct. 2024 à 02:41, Michael Paquier <michael(at)paquier(dot)xyz> a écrit :

> On Mon, Oct 07, 2024 at 12:43:18AM +0300, Alena Rybakina wrote:
> > Maybe I'm not aware of the whole context of the thread and maybe my
> > questions will seem a bit stupid, but honestly
> > it's not entirely clear to me how this statistics will help to adjust the
> > number of parallel workers.
> > We may have situations when during overestimation of the cardinality
> during
> > query optimization a several number of parallel workers were
> unjustifiably
> > generated and vice versa -
> > due to a high workload only a few number of workers were generated.
> > How do we identify such cases so as not to increase or decrease the
> number
> > of parallel workers when it is not necessary?
>
> Well. For spiky workloads, only these numbers are not going to help.
> If you can map them with the number of times a query related to these
> tables has been called, something that pg_stat_statements would be
> able to show more information about.
>
> FWIW, I have doubts that these numbers attached to this portion of the
> system are always useful. For OLTP workloads, parallel workers would
> unlikely be spawned because even with JOINs we won't work with a high
> number of tuples that require them. This could be interested with
> analytics, however complex query sequences mean that we'd still need
> to look at all the plans involving the relations where there is an
> unbalance of planned/spawned workers, because these can usually
> involve quite a few gather nodes. At the end of the day, it seems to
> me that we would still need data that involves statements to track
> down specific plans that are starving. If your application does not
> have that many statements, looking at individial plans is OK, but if
> you have hundreds of them to dig into, this is time-consuming and
> stats at table/index level don't offer data in terms of stuff that
> stands out and needs adjustments.
>
> And this is without the argument of bloating more the stats entries
> for each table, even if it matters less now that these stats are in
> shmem lately.
>

We need granularity because we have granularity in the config. There is
pg_stat_database because it gives the whole picture and it is easier to
monitor. And then, there is pg_stat_statements to analyze problematic
statements. And finally there is pg_stat_all* because you can set
parallel_workers on a specific table.

Anyway, offering various ways of getting the same information is not
unheard of. Pretty much like temp_files/temp_bytes in pg_stat_database,
temp_blks_read/temp_blks_written in pg_stat_statements and log_temp_files
in log files if you ask me :)

--
Guillaume.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2024-10-07 06:50:26 Re: Add minimal C example and SQL registration example for custom table access methods.
Previous Message Hunaid Sohail 2024-10-07 06:17:01 Re: Psql meta-command conninfo+