From: | Scott Mead <scott(at)meads(dot)us> |
---|---|
To: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
Cc: | "Mead, Scott" <meads(at)amazon(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [BUG] Autovacuum not dynamically decreasing cost_limit and cost_delay |
Date: | 2021-10-26 15:23:41 |
Message-ID: | CAJsHxiCtvWYAsK3E7FGw_Lpdbh7eF3St2uptsPP-Uz5VJgjjug@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs pgsql-hackers |
On Wed, May 26, 2021 at 4:01 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
wrote:
> On Wed, Apr 14, 2021 at 11:17 PM Mead, Scott <meads(at)amazon(dot)com> wrote:
> >
> >
> >
> > > On Mar 1, 2021, at 8:43 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
> wrote:
> > >
> > > CAUTION: This email originated from outside of the organization. Do
> not click links or open attachments unless you can confirm the sender and
> know the content is safe.
> > >
> > >
> > >
> > > On Mon, Feb 8, 2021 at 11:49 PM Mead, Scott <meads(at)amazon(dot)com> wrote:
> > >>
> > >> Hello,
> > >> I recently looked at what it would take to make a running
> autovacuum pick-up a change to either cost_delay or cost_limit. Users
> frequently will have a conservative value set, and then wish to change it
> when autovacuum initiates a freeze on a relation. Most users end up
> finding out they are in ‘to prevent wraparound’ after it has happened, this
> means that if they want the vacuum to take advantage of more I/O, they need
> to stop and then restart the currently running vacuum (after reloading the
> GUCs).
> > >>
> > >> Initially, my goal was to determine feasibility for making this
> dynamic. I added debug code to vacuum.c:vacuum_delay_point(void) and found
> that changes to cost_delay and cost_limit are already processed by a
> running vacuum. There was a bug preventing the cost_delay or cost_limit
> from being configured to allow higher throughput however.
> > >>
> > >> I believe this is a bug because currently, autovacuum will
> dynamically detect and increase the cost_limit or cost_delay, but it can
> never decrease those values beyond their setting when the vacuum began.
> The current behavior is for vacuum to limit the maximum throughput of
> currently running vacuum processes to the cost_limit that was set when the
> vacuum process began.
> > >
> > > Thanks for your report.
> > >
> > > I've not looked at the patch yet but I agree that the calculation for
> > > autovacuum cost delay seems not to work fine if vacuum-delay-related
> > > parameters (e.g., autovacuum_vacuum_cost_delay etc) are changed during
> > > vacuuming a table to speed up running autovacuums. Here is my
> > > analysis:
> >
> >
> > I appreciate your in-depth analysis and will comment in-line. That
> said, I still think it’s important that the attached path is applied. As
> it is today, a simple few lines of code prevent users from being able to
> increase the throughput on vacuums that are running without having to
> cancel them first.
> >
> > The patch that I’ve provided allows users to decrease their
> vacuum_cost_delay and get an immediate boost in performance to their
> running vacuum jobs.
> >
> >
> > >
> > > Suppose we have the following parameters and 3 autovacuum workers are
> > > running on different tables:
> > >
> > > autovacuum_vacuum_cost_delay = 100
> > > autovacuum_vacuum_cost_limit = 100
> > >
> > > Vacuum cost-based delay parameters for each workers are follows:
> > >
> > > worker->wi_cost_limit_base = 100
> > > worker->wi_cost_limit = 66
> > > worker->wi_cost_delay = 100
>
> Sorry, worker->wi_cost_limit should be 33.
>
> > >
> > > Each running autovacuum has "wi_cost_limit = 66" because the total
> > > limit (100) is equally rationed. And another point is that the total
> > > wi_cost_limit (198 = 66*3) is less than autovacuum_vacuum_cost_limit,
> > > 100. Which are fine.
>
> So the total wi_cost_limit, 99, is less than autovacuum_vacuum_cost_limit,
> 100.
>
> > >
> > > Here let's change autovacuum_vacuum_cost_delay/limit value to speed up
> > > running autovacuums.
> > >
> > > Case 1 : increasing autovacuum_vacuum_cost_limit to 1000.
> > >
> > > After reloading the configuration file, vacuum cost-based delay
> > > parameters for each worker become as follows:
> > >
> > > worker->wi_cost_limit_base = 100
> > > worker->wi_cost_limit = 100
> > > worker->wi_cost_delay = 100
> > >
> > > If we rationed autovacuum_vacuum_cost_limit, 1000, to 3 workers, it
> > > would be 333. But since we cap it by wi_cost_limit_base, the
> > > wi_cost_limit is 100. I think this is what Mead reported here.
> >
> >
> > Yes, this is exactly correct. The cost_limit is capped at the
> cost_limit that was set during the start of a running vacuum. My patch
> changes this cap to be the max allowed cost_limit (10,000).
>
> The comment of worker's limit calculation says:
>
> /*
> * We put a lower bound of 1 on the cost_limit, to avoid division-
> * by-zero in the vacuum code. Also, in case of roundoff trouble
> * in these calculations, let's be sure we don't ever set
> * cost_limit to more than the base value.
> */
> worker->wi_cost_limit = Max(Min(limit,
> worker->wi_cost_limit_base),
> 1);
>
> If we use the max cost_limit as the upper bound here, the worker's
> limit could unnecessarily be higher than the base value in case of
> roundoff trouble? I think that the problem here is rather that we
> don't update wi_cost_limit_base and wi_cost_delay when rebalancing the
> cost.
>
Currently, vacuum always limits you to the cost_limit_base from the time
that your vacuum started. I'm not sure why, I don't believe it's rounding
related because the rest of the rebalancing code works properly. ISTM that
looking simply allowing the updated cost_limit is a simple solution since
the rebalance code will automatically take it into account.
>
> Regards,
>
> --
> Masahiko Sawada
> EDB: https://www.enterprisedb.com/
>
>
>
--
--
Scott Mead
*scott(at)meads(dot)us <scott(at)meads(dot)us>*
From | Date | Subject | |
---|---|---|---|
Next Message | Pavel Borisov | 2021-10-26 19:15:04 | Re: BUG #17246: Feature request for adoptive indexes |
Previous Message | Tom Lane | 2021-10-26 14:29:39 | Re: conchuela timeouts since 2021-10-09 system upgrade |
From | Date | Subject | |
---|---|---|---|
Next Message | Sasasu | 2021-10-26 15:47:10 | Re: XTS cipher mode for cluster file encryption |
Previous Message | Mark Dilger | 2021-10-26 15:12:47 | Re: CREATEROLE and role ownership hierarchies |