Re: Track the amount of time waiting due to cost_delay

From: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: Track the amount of time waiting due to cost_delay
Date: 2024-06-24 10:50:13
Message-ID: ZnlPZZZJCRu/8fka@ip-10-97-1-34.eu-west-3.compute.internal
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Sat, Jun 22, 2024 at 12:48:33PM +0000, Bertrand Drouvot wrote:
> 1. vacuuming indexes time has been longer on master because with v2, the leader
> has been interrupted 342605 times while waiting, then making v2 "faster".
>
> 2. the leader being interrupted while waiting is also already happening on master
> due to the pgstat_progress_parallel_incr_param() calls in
> parallel_vacuum_process_one_index() (that have been added in
> 46ebdfe164). It has been the case "only" 36 times during my test case.
>
> I think that 2. is less of a concern but I think that 1. is something that needs
> to be addressed because the leader process is not honouring its cost delay wait
> time in a noticeable way (at least during my test case).
>
> I did not think of a proposal yet, just sharing my investigation as to why
> v2 has been faster than master during the vacuuming indexes phase.

I think that a reasonable approach is to make the reporting from the parallel
workers to the leader less aggressive (means occur less frequently).

Please find attached v3, that:

- ensures that there is at least 1 second between 2 reports, per parallel worker,
to the leader.

- ensures that the reported delayed time is still correct (keep track of the
delayed time between 2 reports).

- does not add any extra pg_clock_gettime_ns() calls (as compare to v2).

Remarks:

1. Having a time based only approach to throttle the reporting of the parallel
workers sounds reasonable. I don't think that the number of parallel workers has
to come into play as:

1.1) the more parallel workers is used, the less the impact of the leader on
the vacuum index phase duration/workload is (because the repartition is done
on more processes).

1.2) the less parallel workers is, the less the leader will be interrupted (
less parallel workers would report their delayed time).

2. The throttling is not based on the cost limit as that value is distributed
proportionally among the parallel workers (so we're back to the previous point).

3. The throttling is not based on the actual cost delay value because the leader
could be interrupted at the beginning, the midle or whatever part of the wait and
we are more interested about the frequency of the interrupts.

3. A 1 second reporting "throttling" looks a reasonable threshold as:

3.1 the idea is to have a significant impact when the leader could have been
interrupted say hundred/thousand times per second.

3.2 it does not make that much sense for any tools to sample pg_stat_progress_vacuum
multiple times per second (so a one second reporting granularity seems ok).

With this approach in place, v3 attached applied, during my test case:

- the leader has been interrupted about 2500 times (instead of about 345000
times with v2)

- the vacuum index phase duration is very close to the master one (it has been
4 seconds faster (over a 8 minutes 40 seconds duration time), instead of 3
minutes faster with v2).

Thoughts?

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
v3-0001-Report-the-total-amount-of-time-that-vacuum-has-b.patch text/x-diff 8.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2024-06-24 10:54:38 Re: long-standing data loss bug in initial sync of logical replication
Previous Message Andrew Dunstan 2024-06-24 10:46:57 Re: Buildfarm animal caiman showing a plperl test issue with newer Perl versions