From: | Antonin Houska <ah(at)cybertec(dot)at> |
---|---|
To: | Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> |
Cc: | Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Michael Banck <mbanck(at)gmx(dot)net>, Junwang Zhao <zhjwpku(at)gmail(dot)com>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: why there is not VACUUM FULL CONCURRENTLY? |
Date: | 2025-01-31 20:02:47 |
Message-ID: | 26884.1738353767@antos |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
> On 2025-Jan-31, Antonin Houska wrote:
>
> > Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com> wrote:
>
> > > First, due to the XLog-based change detection this feature can't work
> > > for unlogged tables without first changing them to logged (which
> > > implies first writing the whole table to XLog, to not cause issues on
> > > any replicas). However, documentation for this limitation seems to be
> > > missing from the patches, and I hope a solution can be found without
> > > requiring LOGGED.
> >
> > Currently I've got no idea how to handle UNLOGGED table. I'll at least fix the
> > documentation.
>
> Yeah, I think it should be possible, but it's going to require
> complicated additional changes to support. I suggest that in the first
> version we leave this out, and we can implement it afterwards.
>
> > > For (2), I think the scan needs a snapshot to guarantee we keep the
> > > original tuples of updates around, wich will hold back any other
> > > VACUUM activity in the database.
>
> > A single snapshot is used because there is a single stream of decoded data
> > changes. Thus a new version of a tuple is either visible to the snapshot or it
> > appears in the stream, but not both.
>
> I agree with Matthias that this is going to be a problem. In fact, if
> we need to keep the snapshot for long enough (depending on how long it
> takes to scan the table), then the snapshot that it needs to keep would
> disrupt vacuuming on all the other tables, causing more bloat. If it's
> bad enough (say because the table is big enough to take hours to repack
> and recreate the indexes on), the bloat situation might be worse after
> REPACK has completed than it was before.
>
> But -- again -- I think we need to limit the complexity of this patch,
> or otherwise we're never going to get it done. So I propose that in our
> first implementation we continue to use a single snapshot, and we can
> try to find ways to grab fresh snapshots from time to time as a later
> improvement on the patch. Customers in situations so bad that they
> can't use REPACK to fix their bloat in 18, are already unable to fix it
> in earlier versions, so this would not be a regression.
I thought about it more during the afternoon. I think that in this case
(i.e. snapshot created by the logical replication system), the xmin horizon is
controlled by the xmin of the replication slot rather than that of the
snapshot. And I think that the slot we use for REPACK can have xmin set to
invalid (unlike catalog_xmin) as long as we ensure that (even "lazy") VACUUM
ignores table that is being processed by REPACK. In other words, REPACK does
not have to disrupt vacuuming of the other tables. Please correct me if I'm
wrong.
Since the current proposal of REPACK already stores the relation OID in the
shared memory (so that all backends know that they should write enough
information to WAL when doing changes in the table), disabling VACUUM for that
table should not be difficult.
--
Antonin Houska
Web: https://www.cybertec-postgresql.com
From | Date | Subject | |
---|---|---|---|
Next Message | Masahiko Sawada | 2025-01-31 21:23:30 | Re: Conflict detection for update_deleted in logical replication |
Previous Message | Sami Imseih | 2025-01-31 19:27:14 | Re: pgbench with partitioned tables |