Re: Log a warning in pg_createsubscriber for max_slot_wal_keep_size

From: Shubham Khanna <khannashubham1197(at)gmail(dot)com>
To: vignesh C <vignesh21(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Log a warning in pg_createsubscriber for max_slot_wal_keep_size
Date: 2024-12-30 06:34:33
Message-ID: CAHv8RjJ7E3LteqiumZXpmyS=xz1QY2vZVV+9nON6UWyp++UP+g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Dec 30, 2024 at 10:10 AM vignesh C <vignesh21(at)gmail(dot)com> wrote:
>
> On Mon, 30 Dec 2024 at 09:34, Shubham Khanna
> <khannashubham1197(at)gmail(dot)com> wrote:
> >
> > Hi,
> >
> > Currently, there is a risk that pg_createsubscriber may fail to
> > complete successfully when the max_slot_wal_keep_size value is set too
> > low. This can occur if the WAL is removed before the standby using the
> > replication slot is able to complete replication, as the required WAL
> > files are no longer available.
> >
> > I was able to reproduce this issue using the following steps:
> > Set up a streaming replication environment.
> > Run pg_createsubscriber in a debugger.
> > Pause pg_createsubscriber at the setup_recovery stage.
> > Perform several operations on the primary node to generate a large
> > volume of WAL, causing older WAL segments to be removed due to the low
> > max_slot_wal_keep_size setting.
> > Once the necessary WAL segments are deleted, continue the execution of
> > pg_createsubscriber.
> > At this point, pg_createsubscriber fails with the following error:
> > 2024-12-29 01:21:37.590 IST [427353] FATAL: could not receive data
> > from WAL stream: ERROR: requested WAL segment
> > 000000010000000000000003 has already been removed
> > 2024-12-29 01:21:37.592 IST [427345] LOG: waiting for WAL to become
> > available at 0/3000110
> > 2024-12-29 01:21:42.593 IST [427358] LOG: started streaming WAL from
> > primary at 0/3000000 on timeline 1
> > 2024-12-29 01:21:42.593 IST [427358] FATAL: could not receive data
> > from WAL stream: ERROR: requested WAL segment
> > 000000010000000000000003 has already been removed
> >
> > This issue was previously reported in [1], with a suggestion to raise
> > a warning in [2]. I’ve implemented a patch that logs a warning in
> > dry-run mode. This will give users the opportunity to adjust the
> > max_slot_wal_keep_size value before running the command.
> >
> > Thoughts?
>
> +1 for throwing a warning in dry-run mode
>
> Few comments:
> 1) We can have this check only in dry-run mode, it is not required in
> non dry-run mode as there is nothing much user can do once the tool is
> running, we can change this:
> + if (max_slot_wal_keep_size != -1)
> + {
> + pg_log_warning("publisher requires
> 'max_slot_wal_keep_size = -1', but only %d remain",
> + max_slot_wal_keep_size);
> + pg_log_warning_detail("Change the
> 'max_slot_wal_keep_size' configuration on the publisher to -1.");
> + }
>
> to:
> + if (dry_run && max_slot_wal_keep_size != -1)
> + {
> + pg_log_warning("publisher requires
> 'max_slot_wal_keep_size = -1', but only %d remain",
> + max_slot_wal_keep_size);
> + pg_log_warning_detail("Change the
> 'max_slot_wal_keep_size' configuration on the publisher to -1.");
> + }
>
> 2) This error message is not quite right, can we change it to
> "publisher requires max_slot_wal_keep_size to be -1, but is set to %d"
> + if (max_slot_wal_keep_size != -1)
> + {
> + pg_log_warning("publisher requires
> 'max_slot_wal_keep_size = -1', but only %d remain",
> + max_slot_wal_keep_size);
> + pg_log_warning_detail("Change the
> 'max_slot_wal_keep_size' configuration on the publisher to -1.");
> + }
>
> 3) Also the configuration could be specified in format specifier like
> it is specified in the earlier case
>

I have fixed the given comments. The attached patch contains the
suggested changes.

Thanks and regards,
Shubham Khanna.

Attachment Content-Type Size
v2-0001-Validate-max_slot_wal_keep_size-in-pg_createsubsc.patch application/octet-stream 4.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alena Rybakina 2024-12-30 08:24:25 Re: Exists pull-up application with JoinExpr
Previous Message Peter Smith 2024-12-30 05:34:45 Re: Introduce XID age and inactive timeout based replication slot invalidation