Re: Should vacuum process config file reload more often

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Melanie Plageman <melanieplageman(at)gmail(dot)com>
Cc: Daniel Gustafsson <daniel(at)yesql(dot)se>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Subject: Re: Should vacuum process config file reload more often
Date: 2023-03-24 05:21:03
Message-ID: CAD21AoCn3-tAB1HgNXY-_B9yaRES0u2gyifw=ABchNV8FkswZg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Mar 24, 2023 at 9:27 AM Melanie Plageman
<melanieplageman(at)gmail(dot)com> wrote:
>
> On Thu, Mar 23, 2023 at 2:09 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > On Sun, Mar 19, 2023 at 7:47 AM Melanie Plageman <melanieplageman(at)gmail(dot)com> wrote:
> > Do we need to calculate the number of workers running with
> > nworkers_for_balance by iterating over the running worker list? I
> > guess autovacuum workers can increment/decrement it at the beginning
> > and end of vacuum.
>
> I don't think we can do that because if a worker crashes, we have no way
> of knowing if it had incremented or decremented the number, so we can't
> adjust for it.

What kind of crash are you concerned about? If a worker raises an
ERROR, we can catch it in PG_CATCH() block. If it's a FATAL, we can do
that in FreeWorkerInfo(). A PANIC error ends up crashing the entire
server.

>
> > > > > > > Also not sure how the patch interacts with failsafe autovac and parallel
> > > > > > > vacuum.
> > > > > >
> > > > > > Good point.
> > > > > >
> > > > > > When entering the failsafe mode, we disable the vacuum delays (see
> > > > > > lazy_check_wraparound_failsafe()). We need to keep disabling the
> > > > > > vacuum delays even after reloading the config file. One idea is to
> > > > > > have another global variable indicating we're in the failsafe mode.
> > > > > > vacuum_delay_point() doesn't update VacuumCostActive if the flag is
> > > > > > true.
> > > > >
> > > > > I think we might not need to do this. Other than in
> > > > > lazy_check_wraparound_failsafe(), VacuumCostActive is only updated in
> > > > > two places:
> > > > >
> > > > > 1) in vacuum() which autovacuum will call per table. And failsafe is
> > > > > reset per table as well.
> > > > >
> > > > > 2) in vacuum_delay_point(), but, since VacuumCostActive will already be
> > > > > false when we enter vacuum_delay_point() the next time after
> > > > > lazy_check_wraparound_failsafe(), we won't set VacuumCostActive there.
> > > >
> > > > Indeed. But does it mean that there is no code path to turn
> > > > vacuum-delay on, even when vacuum_cost_delay is updated from 0 to
> > > > non-0?
> > >
> > > Ah yes! Good point. This is true.
> > > I'm not sure how to cheaply allow for re-enabling delays after disabling
> > > them in the middle of a table vacuum.
> > >
> > > I don't see a way around checking if we need to reload the config file
> > > on every call to vacuum_delay_point() (currently, we are only doing this
> > > when we have to wait anyway). It seems expensive to do this check every
> > > time. If we do do this, we would update VacuumCostActive when updating
> > > VacuumCostDelay, and we would need a global variable keeping the
> > > failsafe status, as you mentioned.
> > >
> > > It could be okay to say that you can only disable cost-based delays in
> > > the middle of vacuuming a table (i.e. you cannot enable them if they are
> > > already disabled until you start vacuuming the next table). Though maybe
> > > it is weird that you can increase the delay but not re-enable it...
> >
> > On Mon, Mar 20, 2023 at 1:48 AM Melanie Plageman
> > <melanieplageman(at)gmail(dot)com> wrote:
> > > So, I thought about it some more, and I think it is a bit odd that you
> > > can increase the delay and limit but not re-enable them if they were
> > > disabled. And, perhaps it would be okay to check ConfigReloadPending at
> > > the top of vacuum_delay_point() instead of only after sleeping. It is
> > > just one more branch. We can check if VacuumCostActive is false after
> > > checking if we should reload and doing so if needed and return early.
> > > I've implemented that in attached v6.
> > >
> > > I added in the global we discussed for VacuumFailsafeActive. If we keep
> > > it, we can probably remove the one in LVRelState -- as it seems
> > > redundant. Let me know what you think.
> >
> > I think the following change is related:
> >
> > - if (!VacuumCostActive || InterruptPending)
> > + if (InterruptPending || VacuumFailsafeActive ||
> > + (!VacuumCostActive && !ConfigReloadPending))
> > return;
> >
> > + /*
> > + * Reload the configuration file if requested. This allows changes to
> > + * [autovacuum_]vacuum_cost_limit and [autovacuum_]vacuum_cost_delay to
> > + * take effect while a table is being vacuumed or analyzed.
> > + */
> > + if (ConfigReloadPending && !analyze_in_outer_xact)
> > + {
> > + ConfigReloadPending = false;
> > + ProcessConfigFile(PGC_SIGHUP);
> > + AutoVacuumUpdateDelay();
> > + AutoVacuumUpdateLimit();
> > + }
> >
> > It makes sense to me that we need to reload the config file even when
> > vacuum-delay is disabled. But I think it's not convenient for users
> > that we don't reload the configuration file once the failsafe is
> > triggered. I think users might want to change some GUCs such as
> > log_autovacuum_min_duration.
>
> Ah, okay. Attached v7 has this change (it reloads even if failsafe is
> active).
>
> > > And, I was wondering if it was worth trying to split up the part that
> > > reloads the config file and all of the autovacuum stuff. The reloading
> > > of the config file by itself won't actually result in autovacuum workers
> > > having updated cost delays because of them overwriting it with
> > > wi_cost_delay, but it will allow VACUUM to have those updated values.
> >
> > It makes sense to me to have changes for overhauling the rebalance
> > mechanism in a separate patch.
> >
> > Looking back at the original concern you mentioned[1]:
> >
> > speed up long-running vacuum of a large table by
> > decreasing autovacuum_vacuum_cost_delay/vacuum_cost_delay, however the
> > config file is only reloaded between tables (for autovacuum) or after
> > the statement (for explicit vacuum).
> >
> > does it make sense to have autovac_balance_cost() update workers'
> > wi_cost_delay too? Autovacuum launcher already reloads the config file
> > and does the rebalance. So I thought autovac_balance_cost() can update
> > the cost_delay as well, and this might be a minimal change to deal
> > with your concern. This doesn't have the effect for manual VACUUM but
> > since vacuum delay is disabled by default it won't be a big problem.
> > As for manual VACUUMs, we would need to reload the config file in
> > vacuum_delay_point() as the part of your patch does. Overhauling the
> > rebalance mechanism would be another patch to improve it further.
>
> So, we can't do this without acquiring an access shared lock on every
> call to vacuum_delay_point() because cost delay is a double.
>
> I will work on a patchset with separate commits for reloading the config
> file, though (with autovac not benefitting in the first commit).
>
> On Thu, Mar 23, 2023 at 12:24 PM Daniel Gustafsson <daniel(at)yesql(dot)se> wrote:
> >
> > +bool VacuumFailsafeActive = false;
> > This needs documentation, how it's used and how it relates to failsafe_active
> > in LVRelState (which it might replace(?), but until then).
>
> Thanks! I've removed LVRelState->failsafe_active.
>
> I've also separated the VacuumFailsafeActive change into its own commit.

@@ -492,6 +493,7 @@ vacuum(List *relations, VacuumParams *params,

in_vacuum = true;
VacuumCostActive = (VacuumCostDelay > 0);
+ VacuumFailsafeActive = false;
VacuumCostBalance = 0;
VacuumPageHit = 0;
VacuumPageMiss = 0;

I think we need to reset VacuumFailsafeActive also in PG_FINALLY()
block in vacuum().

One comment on 0002 patch:

+ /*
+ * Reload the configuration file if requested. This allows changes to
+ * [autovacuum_]vacuum_cost_limit and [autovacuum_]vacuum_cost_delay to
+ * take effect while a table is being vacuumed or analyzed.
+ */
+ if (ConfigReloadPending && !analyze_in_outer_xact)
+ {
+ ConfigReloadPending = false;
+ ProcessConfigFile(PGC_SIGHUP);
+ AutoVacuumUpdateDelay();
+ AutoVacuumUpdateLimit();
+ }

I think we need comments on why we don't reload the config file if
we're analyzing a table in a user transaction.

>
> > I need another few readthroughs to figure out of VacuumFailsafeActive does what
> > I think it does, and should be doing, but in general I think this is a good
> > idea and a patch in good condition close to being committable.

Another approach would be to make VacuumCostActive a ternary value:
on, off, and never. When we trigger the failsafe mode we switch it to
never, meaning that it never becomes active even after reloading the
config file. A good point is that we don't need to add a new global
variable, but I'm not sure it's better than the current approach.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Brar Piening 2023-03-24 05:48:17 Re: doc: add missing "id" attributes to extension packaging page
Previous Message Michael Paquier 2023-03-24 05:18:02 Re: [PoC] Let libpq reject unexpected authentication requests