Re: Track the amount of time waiting due to cost_delay

From: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>
To: Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Track the amount of time waiting due to cost_delay
Date: 2024-06-10 17:48:22
Message-ID: Zmc8Zt2G1axRuga4@ip-10-97-1-34.eu-west-3.compute.internal
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Mon, Jun 10, 2024 at 10:36:42AM -0500, Nathan Bossart wrote:
> On Mon, Jun 10, 2024 at 06:05:13AM +0000, Bertrand Drouvot wrote:
> > During the last pgconf.dev I attended Robert´s presentation about autovacuum and
> > it made me remember of an idea I had some time ago: $SUBJECT
>
> This sounds like useful information to me.

Thanks for looking at it!

> I wonder if we should also
> surface the effective cost limit for each autovacuum worker.

I'm not sure about it as I think that it could be misleading: one could query
pg_stat_progress_vacuum and conclude that the time_delayed he is seeing is
due to _this_ cost_limit. But that's not necessary true as the cost_limit could
have changed multiple times since the vacuum started. So, unless there is
frequent sampling on pg_stat_progress_vacuum, displaying the time_delayed and
the cost_limit could be misleadind IMHO.

> > Currently one can change [autovacuum_]vacuum_cost_delay and
> > [auto vacuum]vacuum_cost_limit but has no reliable way to measure the impact of
> > the changes on the vacuum duration: one could observe the vacuum duration
> > variation but the correlation to the changes is not accurate (as many others
> > factors could impact the vacuum duration (load on the system, i/o latency,...)).
>
> IIUC you'd need to get information from both pg_stat_progress_vacuum and
> pg_stat_activity in order to know what percentage of time was being spent
> in cost delay. Is that how you'd expect for this to be used in practice?

Yeah, one could use a query such as:

select p.*, now() - a.xact_start as duration from pg_stat_progress_vacuum p JOIN pg_stat_activity a using (pid)

for example. Worth to provide an example somewhere in the doc?

> > pgstat_report_wait_start(WAIT_EVENT_VACUUM_DELAY);
> > pg_usleep(msec * 1000);
> > pgstat_report_wait_end();
> > + /* Report the amount of time we slept */
> > + if (VacuumSharedCostBalance != NULL)
> > + pgstat_progress_parallel_incr_param(PROGRESS_VACUUM_TIME_DELAYED, msec);
> > + else
> > + pgstat_progress_incr_param(PROGRESS_VACUUM_TIME_DELAYED, msec);
>
> Hm. Should we measure the actual time spent sleeping, or is a rough
> estimate good enough? I believe pg_usleep() might return early (e.g., if
> the process is signaled) or late, so this field could end up being
> inaccurate, although probably not by much. If we're okay with millisecond
> granularity, my first instinct is that what you've proposed is fine, but I
> figured I'd bring it up anyway.

Thanks for bringing that up! I had the same thought when writing the code and
came to the same conclusion. I think that's a good enough estimation and specially
during a long running vacuum (which is probably the case where users care the
most).

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2024-06-10 18:12:12 libpq contention due to gss even when not using gss
Previous Message Andrew Dunstan 2024-06-10 17:27:27 Re: Non-text mode for pg_dumpall