Re: Log a warning in pg_createsubscriber for max_slot_wal_keep_size

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: Shubham Khanna <khannashubham1197(at)gmail(dot)com>
Cc: vignesh C <vignesh21(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Log a warning in pg_createsubscriber for max_slot_wal_keep_size
Date: 2025-01-01 22:50:29
Message-ID: CAHut+Pv=SwxDaW+jMopqM4zSV-QNLhC3U6Cqc6X5AT3ArZ6Q7g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Shubham.

Here are some review comments for the patch v4-0001.

======
Commit message.

1.
The 'pg_createsubscriber' utility is updated to fetch and validate the
'max_slot_wal_keep_size' setting from the publisher. A warning is raised during
the '--dry-run' mode if the configuration is set to a non-default value.

~

This says a "warning is raised", but patch v4 changed the warning to
an error, so this description is incompatible with the code.

======
doc/src/sgml/ref/pg_createsubscriber.sgml

2.
+ <para>
+ The 'max_slot_wal_keep_size' must be set to -1 to prevent the automatic
+ removal of WAL files needed by replication slots. Setting this parameter to
+ a specific size may lead to replication failures if required WAL files are
+ prematurely deleted.
+ </para>

The 'max_slot_wal_keep_size' should not be single-quoted like that. It
should use the same markup as the other nearby GUC names and also have
a link like those others do.

======
src/bin/pg_basebackup/pg_createsubscriber.c

3.
+/*
+ * Validate max_slot_wal_keep_size
+ * Logical replication requires max_slot_wal_keep_size to be set to -1 on the
+ * publisher to prevent the deletion of WAL files that are still needed by
+ * replication slots. If this parameter is set to a non-default value, it may
+ * cause replication failures due to required WAL files being prematurely
+ * removed.
+ */

3a.
Not fixed? For patch v3 I had already posted [1-#4a] that this entire
comment is wrongly indented.

~

3b.
That first sentence looks like it is missing a period and a following
blank line. OTOH, maybe the first sentence is not even necessary.

~~~

4.
+ if (dry_run && max_slot_wal_keep_size != -1)
+ {
+ pg_log_error("publisher requires max_slot_wal_keep_size to be -1,
but only %d remain",
+ max_slot_wal_keep_size);
+ pg_log_error_hint("Change the configuration parameter \"%s\" on the
publisher to %d.",
+ "max_slot_wal_keep_size", -1);
+ failed = true;
+ }
+

4a.
Not fixed? For patch v3 I had already posted [1-#4b] that it seemed
strange you did not use the \"%s\" substitution for the GUC name of
the first message (unlike the 2nd message).

I also think it is strange that the 1st message uses a hardwired -1,
but the 2nd message uses a parameter for the -1.

~~

4b.
I don't think "but only %d remain" is the appropriate wording for this
GUC. It looks like a cut/paste mistake from a previous message.
Anyway, maybe this message should be something quite different so it
is not just duplicating the same information as the error_hint. e.g.
see below for one idea. This could also resolve the previous review
comment.

SUGGESTION
"publisher requires WAL size must not be restricted"

~

4c.
I saw that this only emits the message in dry-mode, which is
apparently based on the suggestion from Vignesh [2-#1]. Meanwhile, all
the comments/docs say "it may cause replication failures", so I felt
it might be better to always stop everything up-front if there is a
wrong configuration, instead of waiting for potential failures to
happen -- that's why I had suggested using error instead of a warning,
but maybe I am in the minority.

IIUC there are two ways to address this configuration problem:

A. Give warnings, but only in dry-run mode. If the warnings are
ignored (or if the dry-run was not even done), there may be
replication failures later, but we HOPE it will not happen.

B. Give errors in all modes. The early config error prevents any
chance of later replication errors.

So, I think the implementation needs to choose either A or B.
Currently, it seems a mixture.

======
[1] my v3 review -
https://www.postgresql.org/message-id/CAHut%2BPsgEphCa-P-3Q7cORA%3D1TRv15UUsaxN9ZkX56M1-J_QRg%40mail.gmail.com
[2] https://www.postgresql.org/message-id/CALDaNm09cRzke52UN5zx33PT390whU92oXY4gfOSZEo17CLPjw%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2025-01-01 22:55:32 Incorrect CHUNKHDRSZ in nodeAgg.c
Previous Message Heikki Linnakangas 2025-01-01 22:12:36 Re: POC: make mxidoff 64 bits