Re: autovacuum: change priority of the vacuumed tables

From: Marco Garavello <marco(dot)garavello(at)carel(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Jim Nasby <jim(dot)nasby(at)openscg(dot)com>, Ildus Kurbangaliev <i(dot)kurbangaliev(at)postgrespro(dot)ru>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Grigory Smolkin <g(dot)smolkin(at)postgrespro(dot)ru>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: autovacuum: change priority of the vacuumed tables
Date: 2022-01-14 12:24:09
Message-ID: CAFp+5ESwDJDL=nVrhMz0X-bBoeBVQtUhboT0X8QQ1QW_jGwL2g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello all,

I've just read this thread as I googled for "*how to give to autovacuum
priority on a table*"
I'm looking this because I have large table with lot of inserts/updates,
and to improve the system I want to force the first free autovacuum worker
to work on that table, of course if condition "such table is in
need-to-be-vacuumed-state (depending on *autovacuum_scale_factor*
parameter)" is satisfed.
I found that *autovacuum_scale_factor* can be specified for each table, so
I think this could do the trick, isn't it?
From my understanding specifying it lower on the target table should give
an implicit higher priority to that table.

*Distinti Saluti / *Kind Regards

*Marco Garavello* | IoT Cloud Operations Specialist

Il giorno ven 14 gen 2022 alle ore 12:55 Tomas Vondra <
tomas(dot)vondra(at)2ndquadrant(dot)com> ha scritto:

>
> On 03/03/2018 10:21 PM, Jim Nasby wrote:
> > On 3/3/18 2:53 PM, Tomas Vondra wrote:
> >> That largely depends on what knobs would be exposed. I'm against adding
> >> some low-level knobs that perhaps 1% of the users will know how to tune,
> >> and the rest will set it incorrectly. Some high-level options that would
> >> specify the workload type might work, but I have no idea about details.
> >
> > Not knowing about details is why we've been stuck here for years: it's
> > not terribly obvious how to create a scheduler that is going to work in
> > all situations. Current autovac is great for 80% of situations, but it
> > simply doesn't handle the remaining 20% by itself. Once you're pushing
> > your IO limits you *have* to start scheduling manual vacuums for any
> > critical tables.
> >
> > At least if we exposed some low level ability to control autovac
> > workers then others could create tools to improve the situation.
> > Currently that's not possible because manual vacuum lacks features
> > that autovac has.
> >
>
> I have my doubts about both points - usefulness of low-level controls
> and viability of tools built on them.
>
> Firstly, my hunch is that if we knew what low-level controls to expose,
> it would pretty much how to implement the tool internally. Exposing
> something just because you home someone will find a use for that seems
> like a dead-end to me. So, which controls would you expose?
>
> Second, all the statistics used to decide which tables need vacuuming
> are already exposed, and we have things like bgworkers etc. So you could
> go and write a custom autovacuum today - copy the autovacuum code, tweak
> the scheduling, done. Yet no such tool emerged yet. Why is that?
>
> >>> One fairly simple option would be to simply replace the logic
> >>> that currently builds a worker's table list with running a query
> >>> via SPI. That would allow for prioritizing important tables. It
> >>> could also reduce the problem of workers getting "stuck" on a ton
> >>> of large tables by taking into consideration the total number of
> >>> pages/tuples a list contains.
> >>>
> >> I don't see why SPI would be needed to do that, i.e. why couldn't
> >> we implement such prioritization with the current approach. Another
> >> thing
> >
> > Sure, it's just a SMOC. But most of the issue here is actually a
> > query problem. I suspect that the current code would actually shrink
> > if converted to SPI. In any case, I'm not wed to that idea.
> >
>
> I disagree this a "query problem" - it certainly is not the case that
> simply prioritizing the tables differently will make a difference. Or
> more precisely, it certainly does not solve the autovacuum issues I'm
> thinking about. I have no idea which issues are you trying to solve,
> because you haven't really described those.
>
> >> is I really doubt prioritizing "important tables" is an good solution,
> >> as it does not really guarantee anything.
> >
> > If by "important" you mean small tables with high update rates,
> > prioritizing those actually would help as long as you have free workers.
> > By itself it doesn't gain all that much though.
> >
>
> Which is why I mentioned we could have separate pools of autovacuum
> workers - one for regular tables, one for "important" ones.
>
> >>> A more fine-grained approach would be to have workers make a new
> >>> selection after every vacuum they complete. That would provide
> >>> the ultimate in control, since you'd be able to see exactly what
> >>> all the other workers are doing.
> >> That was proposed earlier in this thread, and the issue is it may
> >> starve all the other tables when the "important" tables need
> >> cleanup all the time.
> >
> > There's plenty of other ways to shoot yourself in the foot in that
> > regard already. We can always have safeguards in place if we get too
> > close to wrap-around, just like we currently do.
>
> I haven't mentioned wraparound at all.
>
> My point is that if you entirely ignore some tables because "important"
> ones need constant cleanup (saturating the total autovacuum capacity),
> then you'll end up with extreme bloat in those other tables. And then
> you will need to do more work to clean them up, which will likely cause
> delays in cleaning up the important ones.
>
>
> FWIW I find this discussion rather too hand-wavy, because I have no idea
> what controls would you like to expose, etc. If you have an idea, please
> write a patch or at least a proposal explaining the details.
>
> regards
>
> --
> Tomas Vondra
> https://urldefense.com/v3/__http://www.2ndQuadrant.com__;!!Ck4v2Rc!h1TqYTvzyQn1mDOR119nZYsN26gRVhRZ3UIT7sQT60SgRR7833KpQ3CtJ6l0JpMIEt3JBbd09ZZ9-N3U5MHRn7r7RU4NEiRE$
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>
>
>
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message tanghy.fnst@fujitsu.com 2022-01-14 13:18:49 RE: Column Filtering in Logical Replication
Previous Message Peter Eisentraut 2022-01-14 12:21:13 Re: [PATCH] Allow multiple recursive self-references