Re: why there is not VACUUM FULL CONCURRENTLY?

From: Antonin Houska <ah(at)cybertec(dot)at>
To: Junwang Zhao <zhjwpku(at)gmail(dot)com>
Cc: Kirill Reshke <reshkekirill(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: why there is not VACUUM FULL CONCURRENTLY?
Date: 2024-09-04 11:41:35
Message-ID: 9421.1725450095@antos
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Junwang Zhao <zhjwpku(at)gmail(dot)com> wrote:

> Thanks for working on this, I think this is a very useful feature.
>
> The patch doesn't compile in the debug build with errors:
>
> ../postgres/src/backend/commands/cluster.c: In function β€˜get_catalog_state’:
> ../postgres/src/backend/commands/cluster.c:2771:33: error: declaration
> of β€˜td_src’ shadows a previous local [-Werror=shadow=compatible-local]
> 2771 | TupleDesc td_src, td_dst;
> | ^~~~~~
> ../postgres/src/backend/commands/cluster.c:2741:25: note: shadowed
> declaration is here
> 2741 | TupleDesc td_src = RelationGetDescr(rel);

ok, gcc14 complains here, the compiler I used before did not. Fixed.

> you forgot the meson build for pgoutput_cluster
>
> diff --git a/src/backend/meson.build b/src/backend/meson.build
> index 78c5726814..0f9141a4ac 100644
> --- a/src/backend/meson.build
> +++ b/src/backend/meson.build
> @@ -194,5 +194,6 @@ pg_test_mod_args = pg_mod_args + {
> subdir('jit/llvm')
> subdir('replication/libpqwalreceiver')
> subdir('replication/pgoutput')
> +subdir('replication/pgoutput_cluster')

Fixed, thanks. That might be the reason for the cfbot to fail when using
meson.

> I noticed that you use lmode/lock_mode/lockmode, there are lmode and lockmode
> in the codebase, but I remember someone proposed all changes to lockmode, how
> about sticking to lockmode in your patch?

Fixed.

> 0004:
>
> + sure that the old files do not change during the processing because the
> + chnages would get lost due to the swap.
> typo

Fixed.

>
> + files. The data changes that took place during the creation of the new
> + table and index files are captured using logical decoding
> + (<xref linkend="logicaldecoding"/>) and applied before
> + the <literal>ACCESS EXCLUSIVE</literal> lock is requested. Thus the lock
> + is typically held only for the time needed to swap the files, which
> + should be pretty short.
>
> I remember pg_squeeze also did some logical decoding after getting the exclusive
> lock, if that is still true, I guess the doc above is not precise.

The decoding takes place before requesting the lock, as well as after
that. I've adjusted the paragraph, see 0007.

> + Note that <command>CLUSTER</command> with the
> + the <literal>CONCURRENTLY</literal> option does not try to order the
> + rows inserted into the table after the clustering started.
>
> Do you mean after the *logical decoding* started here? If CLUSTER CONCURRENTLY
> does not order rows at all, why bother implementing it?

The rows inserted before CLUSTER (CONCURRENTLY) started do get ordered, the
rows inserted after that do not. (Actually what matters is when the snapshot
for the initial load is created, but that happens in very early stage of the
processing. Not sure if user is interested in such implementation details.)

> + errhint("CLUSTER CONCURRENTLY is only allowed for permanent relations")));
>
> errhint messages should end with a dot. Why hardcoded to "CLUSTER CONCURRENTLY"
> instead of parameter *stmt*.

Fixed.

> + ResourceOwner oldowner = CurrentResourceOwner;
> +
> + /*
> + * In the CONCURRENT case, do the planning in a subtrensaction so that
> typo

Fixed.

> I did not see VacuumStmt changes in gram.y, how do we suppose to
> use the vacuum full concurrently? I tried the following but no success.

With the "parethesized syntax", new options can be added w/o changing
gram.y. (While the "unparenthesized syntax" is deprecated.)

> [local] postgres(at)demo:5432-36097=# vacuum (concurrently) aircrafts_data;
> ERROR: CONCURRENTLY can only be specified with VACUUM FULL

The "lazy" VACUUM works concurrently as such.

> [local] postgres(at)demo:5432-36097=# vacuum full (concurrently) full
> aircrafts_data;
> ERROR: syntax error at or near "("
> LINE 1: vacuum full (concurrently) full aircrafts_data;

This is not specific to the CONCURRENTLY option. For example:

postgres=3D# vacuum full (analyze) full aircrafts_data;
ERROR: syntax error at or near "("
LINE 1: vacuum full (analyze) full aircrafts_data;

(You seem to combine the parenthesized syntax with the unparenthesized.)

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

Attachment Content-Type Size
v03-0001-Adjust-signature-of-cluster_rel-and-its-subroutines.patch text/x-diff 14.9 KB
v03-0002-Move-progress-related-fields-from-PgBackendStatus-to.patch text/x-diff 6.4 KB
v03-0003-Move-conversion-of-a-historic-to-MVCC-snapshot-to-a-.patch text/x-diff 5.4 KB
v03-0004-Add-CONCURRENTLY-option-to-both-VACUUM-FULL-and-CLUS.patch text/plain 171.4 KB
v03-0005-Preserve-visibility-information-of-the-concurrent-da.patch text/x-diff 39.0 KB
v03-0006-Add-regression-tests.patch text/x-diff 10.4 KB
v03-0007-Introduce-cluster_max_xlock_time-configuration-varia.patch text/x-diff 20.4 KB
v03-0008-Call-logical_rewrite_heap_tuple-when-applying-concur.patch text/x-diff 26.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2024-09-04 11:54:56 Re: BUG #18598: AddressSanitizer detects use after free inside json_unique_hash_match()
Previous Message Amit Kapila 2024-09-04 11:34:05 Re: Commit Timestamp and LSN Inversion issue