Re: why there is not VACUUM FULL CONCURRENTLY?

From: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To: Antonin Houska <ah(at)cybertec(dot)at>
Cc: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Michael Banck <mbanck(at)gmx(dot)net>, Junwang Zhao <zhjwpku(at)gmail(dot)com>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: why there is not VACUUM FULL CONCURRENTLY?
Date: 2025-01-31 13:34:48
Message-ID: 202501311334.za2d2icg37sr@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2025-Jan-31, Antonin Houska wrote:

> Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com> wrote:

> > First, due to the XLog-based change detection this feature can't work
> > for unlogged tables without first changing them to logged (which
> > implies first writing the whole table to XLog, to not cause issues on
> > any replicas). However, documentation for this limitation seems to be
> > missing from the patches, and I hope a solution can be found without
> > requiring LOGGED.
>
> Currently I've got no idea how to handle UNLOGGED table. I'll at least fix the
> documentation.

Yeah, I think it should be possible, but it's going to require
complicated additional changes to support. I suggest that in the first
version we leave this out, and we can implement it afterwards.

> > For (2), I think the scan needs a snapshot to guarantee we keep the
> > original tuples of updates around, wich will hold back any other
> > VACUUM activity in the database.

> A single snapshot is used because there is a single stream of decoded data
> changes. Thus a new version of a tuple is either visible to the snapshot or it
> appears in the stream, but not both.

I agree with Matthias that this is going to be a problem. In fact, if
we need to keep the snapshot for long enough (depending on how long it
takes to scan the table), then the snapshot that it needs to keep would
disrupt vacuuming on all the other tables, causing more bloat. If it's
bad enough (say because the table is big enough to take hours to repack
and recreate the indexes on), the bloat situation might be worse after
REPACK has completed than it was before.

But -- again -- I think we need to limit the complexity of this patch,
or otherwise we're never going to get it done. So I propose that in our
first implementation we continue to use a single snapshot, and we can
try to find ways to grab fresh snapshots from time to time as a later
improvement on the patch. Customers in situations so bad that they
can't use REPACK to fix their bloat in 18, are already unable to fix it
in earlier versions, so this would not be a regression.

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
"No tengo por qué estar de acuerdo con lo que pienso"
(Carlos Caszeli)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2025-01-31 13:40:50 Re: NOT ENFORCED constraint feature
Previous Message Maxim Orlov 2025-01-31 13:29:15 Re: Convert macros to static inline functions