Re: Log a warning in pg_createsubscriber for max_slot_wal_keep_size

From: Shubham Khanna <khannashubham1197(at)gmail(dot)com>
To: Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: vignesh C <vignesh21(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Log a warning in pg_createsubscriber for max_slot_wal_keep_size
Date: 2025-01-02 09:21:43
Message-ID: CAHv8RjKTxgUU37Ou42FRio-28+27xAiY_p_9yA=Yvmi4VFD2dw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jan 2, 2025 at 4:20 AM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> Hi Shubham.
>
> Here are some review comments for the patch v4-0001.
>
> ======
> Commit message.
>
> 1.
> The 'pg_createsubscriber' utility is updated to fetch and validate the
> 'max_slot_wal_keep_size' setting from the publisher. A warning is raised during
> the '--dry-run' mode if the configuration is set to a non-default value.
>
>
> This says a "warning is raised", but patch v4 changed the warning to
> an error, so this description is incompatible with the code.
>

Since, this patch now raises a warning instead of an error so I think
this should be the same as before.

> ======
> doc/src/sgml/ref/pg_createsubscriber.sgml
>
> 2.
> + <para>
> + The 'max_slot_wal_keep_size' must be set to -1 to prevent the automatic
> + removal of WAL files needed by replication slots. Setting this parameter to
> + a specific size may lead to replication failures if required WAL files are
> + prematurely deleted.
> + </para>
>
> The 'max_slot_wal_keep_size' should not be single-quoted like that. It
> should use the same markup as the other nearby GUC names and also have
> a link like those others do.
>

Added the link for 'max_slot_wal_keep_size' and updated the
'pg_createsubscriber' documentation.

> ======
> src/bin/pg_basebackup/pg_createsubscriber.c
>
> 3.
> +/*
> + * Validate max_slot_wal_keep_size
> + * Logical replication requires max_slot_wal_keep_size to be set to -1 on the
> + * publisher to prevent the deletion of WAL files that are still needed by
> + * replication slots. If this parameter is set to a non-default value, it may
> + * cause replication failures due to required WAL files being prematurely
> + * removed.
> + */
>
> 3a.
> Not fixed? For patch v3 I had already posted [1-#4a] that this entire
> comment is wrongly indented.
>

Fixed.

>
> 3b.
> That first sentence looks like it is missing a period and a following
> blank line. OTOH, maybe the first sentence is not even necessary.
>

Fixed.

>
> 4.
> + if (dry_run && max_slot_wal_keep_size != -1)
> + {
> + pg_log_error("publisher requires max_slot_wal_keep_size to be -1,
> but only %d remain",
> + max_slot_wal_keep_size);
> + pg_log_error_hint("Change the configuration parameter \"%s\" on the
> publisher to %d.",
> + "max_slot_wal_keep_size", -1);
> + failed = true;
> + }
> +
>
> 4a.
> Not fixed? For patch v3 I had already posted [1-#4b] that it seemed
> strange you did not use the \"%s\" substitution for the GUC name of
> the first message (unlike the 2nd message).
>
> I also think it is strange that the 1st message uses a hardwired -1,
> but the 2nd message uses a parameter for the -1.
>

Updated the warning message and warning detail.

>
> 4b.
> I don't think "but only %d remain" is the appropriate wording for this
> GUC. It looks like a cut/paste mistake from a previous message.
> Anyway, maybe this message should be something quite different so it
> is not just duplicating the same information as the error_hint. e.g.
> see below for one idea. This could also resolve the previous review
> comment.
>
> SUGGESTION
> "publisher requires WAL size must not be restricted"
>

I have used your suggestion in the latest patch.

>
> 4c.
> I saw that this only emits the message in dry-mode, which is
> apparently based on the suggestion from Vignesh [2-#1]. Meanwhile, all
> the comments/docs say "it may cause replication failures", so I felt
> it might be better to always stop everything up-front if there is a
> wrong configuration, instead of waiting for potential failures to
> happen -- that's why I had suggested using error instead of a warning,
> but maybe I am in the minority.
>
> IIUC there are two ways to address this configuration problem:
>
> A. Give warnings, but only in dry-run mode. If the warnings are
> ignored (or if the dry-run was not even done), there may be
> replication failures later, but we HOPE it will not happen.
>
> B. Give errors in all modes. The early config error prevents any
> chance of later replication errors.
>
> So, I think the implementation needs to choose either A or B.
> Currently, it seems a mixture.
>
> ======
> [1] my v3 review -
> https://www.postgresql.org/message-id/CAHut%2BPsgEphCa-P-3Q7cORA%3D1TRv15UUsaxN9ZkX56M1-J_QRg%40mail.gmail.com
> [2] https://www.postgresql.org/message-id/CALDaNm09cRzke52UN5zx33PT390whU92oXY4gfOSZEo17CLPjw%40mail.gmail.com

If 'max_slot_wal_keep_size' is not set to -1, there is no surety that
'pg_createsubscriber' will function as expected and the removal of WAL
files will start to occur. Therefore, a warning is raised instead of
an error to alert users about the potential configuration issue.
If 'max_slot_wal_keep_size' is -1 (the default), replication slots may
retain an unlimited amount of WAL files.

The attached Patch contains the suggested changes.

Thanks and regards,
Shubham Khanna.

Attachment Content-Type Size
v5-0001-Validate-max_slot_wal_keep_size-in-pg_createsubsc.patch application/octet-stream 6.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message vignesh C 2025-01-02 09:27:17 Re: Conflict detection for update_deleted in logical replication
Previous Message Zhou, Zhiguo 2025-01-02 09:14:44 RFC: Lock-free XLog Reservation from WAL