From: | Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com> |
---|---|
To: | Nathan Bossart <nathandbossart(at)gmail(dot)com> |
Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: Track the amount of time waiting due to cost_delay |
Date: | 2024-06-10 17:48:22 |
Message-ID: | Zmc8Zt2G1axRuga4@ip-10-97-1-34.eu-west-3.compute.internal |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On Mon, Jun 10, 2024 at 10:36:42AM -0500, Nathan Bossart wrote:
> On Mon, Jun 10, 2024 at 06:05:13AM +0000, Bertrand Drouvot wrote:
> > During the last pgconf.dev I attended Robert´s presentation about autovacuum and
> > it made me remember of an idea I had some time ago: $SUBJECT
>
> This sounds like useful information to me.
Thanks for looking at it!
> I wonder if we should also
> surface the effective cost limit for each autovacuum worker.
I'm not sure about it as I think that it could be misleading: one could query
pg_stat_progress_vacuum and conclude that the time_delayed he is seeing is
due to _this_ cost_limit. But that's not necessary true as the cost_limit could
have changed multiple times since the vacuum started. So, unless there is
frequent sampling on pg_stat_progress_vacuum, displaying the time_delayed and
the cost_limit could be misleadind IMHO.
> > Currently one can change [autovacuum_]vacuum_cost_delay and
> > [auto vacuum]vacuum_cost_limit but has no reliable way to measure the impact of
> > the changes on the vacuum duration: one could observe the vacuum duration
> > variation but the correlation to the changes is not accurate (as many others
> > factors could impact the vacuum duration (load on the system, i/o latency,...)).
>
> IIUC you'd need to get information from both pg_stat_progress_vacuum and
> pg_stat_activity in order to know what percentage of time was being spent
> in cost delay. Is that how you'd expect for this to be used in practice?
Yeah, one could use a query such as:
select p.*, now() - a.xact_start as duration from pg_stat_progress_vacuum p JOIN pg_stat_activity a using (pid)
for example. Worth to provide an example somewhere in the doc?
> > pgstat_report_wait_start(WAIT_EVENT_VACUUM_DELAY);
> > pg_usleep(msec * 1000);
> > pgstat_report_wait_end();
> > + /* Report the amount of time we slept */
> > + if (VacuumSharedCostBalance != NULL)
> > + pgstat_progress_parallel_incr_param(PROGRESS_VACUUM_TIME_DELAYED, msec);
> > + else
> > + pgstat_progress_incr_param(PROGRESS_VACUUM_TIME_DELAYED, msec);
>
> Hm. Should we measure the actual time spent sleeping, or is a rough
> estimate good enough? I believe pg_usleep() might return early (e.g., if
> the process is signaled) or late, so this field could end up being
> inaccurate, although probably not by much. If we're okay with millisecond
> granularity, my first instinct is that what you've proposed is fine, but I
> figured I'd bring it up anyway.
Thanks for bringing that up! I had the same thought when writing the code and
came to the same conclusion. I think that's a good enough estimation and specially
during a long running vacuum (which is probably the case where users care the
most).
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2024-06-10 18:12:12 | libpq contention due to gss even when not using gss |
Previous Message | Andrew Dunstan | 2024-06-10 17:27:27 | Re: Non-text mode for pg_dumpall |