Re: autovacuum prioritization

From: Greg Stark <stark(at)mit(dot)edu>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: autovacuum prioritization
Date: 2022-01-26 23:56:07
Message-ID: CAM-w4HMWkGGfygmSCZde4r3zg0A_QOmGuFr-AQ3EdtQV6WnE_A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 26 Jan 2022 at 18:46, Greg Stark <stark(at)mit(dot)edu> wrote:
>
> On Thu, 20 Jan 2022 at 14:31, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> >
> > In my view, previous efforts in this area have been too simplistic.
> >
>
> One thing I've been wanting to do something about is I think
> autovacuum needs to be a little cleverer about when *not* to vacuum a
> table because it won't do any good.
>
> I've seen a lot of cases where autovacuum kicks off a vacuum of a
> table even though the globalxmin hasn't really advanced significantly
> over the oldest frozen xid. When it's a large table this really hurts
> because it could be hours or days before it finishes and at that point
> there's quite a bit of bloat.

Another case I would like to see autovacuum get clever about is when
there is a wide disparity in the size of tables. If you have a few
large tables and a few small tables there could be enough bandwidth
for everyone but you can get in trouble if the workers are all tied up
vacuuming the large tables.

This is a case where autovacuum scheduling can create a problem where
there shouldn't be one. It often happens when you have a set of large
tables that were all loaded with data around the same time and you
have your busy tables that are well designed small tables receiving
lots of updates. They can happily be getting vacuumed every 15-30min
and finishing promptly maintaining a nice steady state until one day
all the large tables suddenly hit the freeze threshold and suddenly
all your workers are busy vacuuming huge tables that take hours or
days to vacuum and your small tables bloat by orders of magnitude.

I was thinking of dividing the eligible tables up into ntiles based on
size and then making sure one worker was responsible for each ntile.
I'm not sure that would actually be quite right though.

--
greg

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2022-01-26 23:59:39 Re: Support for NSS as a libpq TLS backend
Previous Message Peter Geoghegan 2022-01-26 23:54:27 Re: autovacuum prioritization