Re: why there is not VACUUM FULL CONCURRENTLY?

From: Junwang Zhao <zhjwpku(at)gmail(dot)com>
To: Antonin Houska <ah(at)cybertec(dot)at>
Cc: Kirill Reshke <reshkekirill(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: why there is not VACUUM FULL CONCURRENTLY?
Date: 2024-08-30 10:18:12
Message-ID: CAEG8a3+Dq5Kr9=banso5Z5_osHScpHv9-GZTrb8SOug_geY1Cw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Tue, Aug 27, 2024 at 8:01 PM Antonin Houska <ah(at)cybertec(dot)at> wrote:
>
> Attached is version 2, the feature itself is now in 0004.
>
> Unlike version 1, it contains some regression tests (0006) and a new GUC to
> control how long the AccessExclusiveLock may be held (0007).
>
> Kirill Reshke <reshkekirill(at)gmail(dot)com> wrote:
>
> > On Fri, 2 Aug 2024 at 11:09, Antonin Houska <ah(at)cybertec(dot)at> wrote:
> > >
> > > Kirill Reshke <reshkekirill(at)gmail(dot)com> wrote:
> > > > However, in general, the 3rd patch is really big, very hard to
> > > > comprehend. Please consider splitting this into smaller (and
> > > > reviewable) pieces.
> > >
> > > I'll try to move some preparation steps into separate diffs, but not sure if
> > > that will make the main diff much smaller. I prefer self-contained patches, as
> > > also explained in [3].
> >
> > Thanks for sharing [3], it is a useful link.
> >
> > There is actually one more case when ACCESS EXCLUSIVE is held: during
> > table rewrite (AT set TAM, AT set Tablespace and AT alter column type
> > are some examples).
> > This can be done CONCURRENTLY too, using the same logical replication
> > approach, or do I miss something?
>
> Yes, the logical replication can potentially be used in other cases.
>
> > I'm not saying we must do it immediately, this should be a separate
> > thread, but we can do some preparation work here.
> >
> > I can see that a bunch of functions which are currently placed in
> > cluster.c can be moved to something like
> > logical_rewrite_heap.c. ConcurrentChange struct and
> > apply_concurrent_insert function is one example of such.
> >
> > So, if this is the case, 0003 patch can be splitted in two:
> > The first one is general utility code for logical table rewrite
> > The second one with actual VACUUM CONCURRENTLY feature.
>
> > What do you think?
>
> I can imagine moving the function process_concurrent_changes() and subroutines
> to a different file (e.g. rewriteheap.c), but moving it into a separate diff
> that does not contain any call of the function makes little sense to me. Such
> a diff would not add any useful functionality and could not be considered
> refactoring either.
>
> So far I at least moved some code to separate diffs: 0003 and 0005. I'll move
> more if I find sensible opportunity in the future.
>
> --
> Antonin Houska
> Web: https://www.cybertec-postgresql.com
>

Thanks for working on this, I think this is a very useful feature.

The patch doesn't compile in the debug build with errors:

../postgres/src/backend/commands/cluster.c: In function ‘get_catalog_state’:
../postgres/src/backend/commands/cluster.c:2771:33: error: declaration
of ‘td_src’ shadows a previous local [-Werror=shadow=compatible-local]
2771 | TupleDesc td_src, td_dst;
| ^~~~~~
../postgres/src/backend/commands/cluster.c:2741:25: note: shadowed
declaration is here
2741 | TupleDesc td_src = RelationGetDescr(rel);

you forgot the meson build for pgoutput_cluster

diff --git a/src/backend/meson.build b/src/backend/meson.build
index 78c5726814..0f9141a4ac 100644
--- a/src/backend/meson.build
+++ b/src/backend/meson.build
@@ -194,5 +194,6 @@ pg_test_mod_args = pg_mod_args + {
subdir('jit/llvm')
subdir('replication/libpqwalreceiver')
subdir('replication/pgoutput')
+subdir('replication/pgoutput_cluster')

I noticed that you use lmode/lock_mode/lockmode, there are lmode and lockmode
in the codebase, but I remember someone proposed all changes to lockmode, how
about sticking to lockmode in your patch?

0004:

+ sure that the old files do not change during the processing because the
+ chnages would get lost due to the swap.
typo

+ files. The data changes that took place during the creation of the new
+ table and index files are captured using logical decoding
+ (<xref linkend="logicaldecoding"/>) and applied before
+ the <literal>ACCESS EXCLUSIVE</literal> lock is requested. Thus the lock
+ is typically held only for the time needed to swap the files, which
+ should be pretty short.

I remember pg_squeeze also did some logical decoding after getting the exclusive
lock, if that is still true, I guess the doc above is not precise.

+ Note that <command>CLUSTER</command> with the
+ the <literal>CONCURRENTLY</literal> option does not try to order the
+ rows inserted into the table after the clustering started.

Do you mean after the *logical decoding* started here? If CLUSTER CONCURRENTLY
does not order rows at all, why bother implementing it?

+ errhint("CLUSTER CONCURRENTLY is only allowed for permanent relations")));

errhint messages should end with a dot. Why hardcoded to "CLUSTER CONCURRENTLY"
instead of parameter *stmt*.

+ ResourceOwner oldowner = CurrentResourceOwner;
+
+ /*
+ * In the CONCURRENT case, do the planning in a subtrensaction so that
typo

I did not see VacuumStmt changes in gram.y, how do we suppose to
use the vacuum full concurrently? I tried the following but no success.

[local] postgres(at)demo:5432-36097=# vacuum (concurrently) aircrafts_data;
ERROR: CONCURRENTLY can only be specified with VACUUM FULL

[local] postgres(at)demo:5432-36097=# vacuum full (concurrently) full
aircrafts_data;
ERROR: syntax error at or near "("
LINE 1: vacuum full (concurrently) full aircrafts_data;

--
Regards
Junwang Zhao

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2024-08-30 11:33:15 Re: allowing extensions to control planner behavior
Previous Message Amit Kapila 2024-08-30 10:13:12 Re: Add contrib/pg_logicalsnapinspect