Re: why there is not VACUUM FULL CONCURRENTLY?

From: Antonin Houska <ah(at)cybertec(dot)at>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Junwang Zhao <zhjwpku(at)gmail(dot)com>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: why there is not VACUUM FULL CONCURRENTLY?
Date: 2025-01-10 09:31:47
Message-ID: 2532.1736501507@antos
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:

> Hi
>
> čt 9. 1. 2025 v 14:35 odesílatel Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> napsal:
>
> On 2024-Dec-11, Antonin Houska wrote:
>
> > Oh, it was too messy. I think I was thinking of too many things at once (such
> > as locking the old heap, the new heap and the new heap's TOAST). Also, one
> > thing that might have contributed to the confusion is that make_new_heap() has
> > the 'lockmode' argument, which receives various values from various
> > callers. However, both the new heap and its TOAST relation are eventually
> > created by heap_create_with_catalog(), and this function always leaves the new
> > relation locked in AccessExclusiveMode. Maybe this needs some refactoring.
> >
> > Therefore I reverted the changes arount make_new_heap() and simply pass NoLock
> > for lockmode in cluster.c
>
> Cool, thanks, I have pushed this. I made some additional minor changes,
> nothing earth-shattering.
>
> Meanwhile the patch 0004 has some seemingly trivial conflicts. If you
> want to rebase, I'd appreciate that. In the meantime I'll give a look
> at the next two other API changes.
>
> I'm not happy with the idea of having this new command be VACUUM (FULL
> CONCURRENTLY). It's a bit of an absurd name if you ask me. Heck, even
> VACUUM (FULL) seems a bit absurd nowadays.
>
> Although it can sound absurd - it makes perfect sense for me - both "FULL" and "CONCURRENTLY" are years used terms.
>
> Maybe we can introduce a synonym like COMPACT for FULL.

Yes, at first glance, FULL might indicate to users that it processes the whole
table, however VACUUM does that regardless this option. COMPACT would be more
accurate because it would tell that, besides removing dead tuples, unused
space is removed properly.

However I'm not sure if the FULL option should have been added to VACUUM at
all. Note that, internally, it uses completely different approach to the
problem of garbage collection. As a consequence, there are several options
which are not compatible with the FULL option: PARALLEL,
DISABLE_PAGE_SKIPPING, BUFFER_USAGE_LIMIT, and maybe some more.

Thus I understand Alvaro's objections against VACUUM (FULL, CONCURRENTLY).

> I don't see a strong benefit for introducing a new command (with almost all
> identical functionality) just because the words sound strange.

If we turn the FULL option into an alias for the new command, and remove that
after "some time", then there is no identical functionality anymore.

The new functionality overlaps with CLUSTER, except that it works
CONCURRENTLY. However, invoking the new functionality via CLUSTER
(CONCURRENTLY) is not a complete solution because it's also usable w/o
ordering. That's why a new command makes sense to me.

After all, the new code aims primarily at bloat removal rather than at
ordering. Note that it only orders the existing rows, but does not even try to
order the rows inserted into the table while the data is being copied to the
new file.

Therefore I can imagine adding a new command that acts like VACUUM (FULL,
CONCURRENTLY), but does not try to be CLUSTER (CONCURRENTL).

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bertrand Drouvot 2025-01-10 09:40:38 Re: per backend WAL statistics
Previous Message Masahiko Sawada 2025-01-10 09:27:23 Re: POC: enable logical decoding when wal_level = 'replica' without a server restart