Re: Track the amount of time waiting due to cost_delay

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Masahiro Ikeda <ikedamsh(at)oss(dot)nttdata(dot)com>
Cc: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, "Imseih (AWS), Sami" <simseih(at)amazon(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: Track the amount of time waiting due to cost_delay
Date: 2024-12-09 11:48:30
Message-ID: CAFiTN-t+=7v1zObd1v2hpvvbs9S0WUye8R8cj4Ddc4KP2=4K8g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Dec 9, 2024 at 2:51 PM Masahiro Ikeda <ikedamsh(at)oss(dot)nttdata(dot)com> wrote:
>
> On 2024-12-06 18:31, Bertrand Drouvot wrote:
> > Hi,
> >
> > On Thu, Dec 05, 2024 at 10:43:51AM +0000, Bertrand Drouvot wrote:
> >> Yeah, people would likely use this new field to monitor long running
> >> vacuum.
> >> Long enough that this error should be acceptable. Do you agree?
> >
> > OTOH, adding the 100% accuracy looks as simple as v9-0002 attached
> > (0001 is
> > same as for v8), so I think we should provide it.
>
This Idea looks good to me. Here are some comments

1.
+ Total amount of time spent in milliseconds waiting due to
<xref linkend="guc-vacuum-cost-delay"/>
+ or <xref linkend="guc-autovacuum-vacuum-cost-delay"/>. In case
of parallel
+ vacuum the reported time is across all the workers and the leader. The
+ workers update the column no more frequently than once per second, so it
+ could show slightly old values.
+ </para></entry>

I think this waiting is influenced due to cost delay as well as cost
limit GUCs because here we are counting total wait time and that very
much depends upon how frequently we are waiting and that's completely
driven by cost limit. Isn't it?

2.
+ if (IsParallelWorker())
+ {
+ instr_time time_since_last_report;
+
+ INSTR_TIME_SET_ZERO(time_since_last_report);
+ INSTR_TIME_ACCUM_DIFF(time_since_last_report, delay_end,
+ last_report_time);
+ nap_time_since_last_report += INSTR_TIME_GET_MILLISEC(delayed_time);
+
+ if (INSTR_TIME_GET_MILLISEC(time_since_last_report) >
WORKER_REPORT_DELAY_INTERVAL)
+ {
+ pgstat_progress_parallel_incr_param(PROGRESS_VACUUM_TIME_DELAYED,
+ nap_time_since_last_report);
+ nap_time_since_last_report = 0;
+ last_report_time = delay_end;
+ }
+ }

Does it make sense to track this "nap_time_since_last_report" in a
shared variable instead of local in individual workers and whenever
the shared variable crosses WORKER_REPORT_DELAY_INTERVAL we can report
this would avoid individual reporting from different workers. Since
we are already computing the cost balance in shared variable i.e.
VacuumSharedCostBalance, or do you think it will complicate the code?

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kirill Reshke 2024-12-09 11:50:16 Re: WARNING: missing lock on database "postgres" (OID 5) @ TID (0,4)
Previous Message John Naylor 2024-12-09 11:39:36 Re: Proposal for Updating CRC32C with AVX-512 Algorithm.