Re: Synchronizing slots from primary to standby

From: shveta malik <shveta(dot)malik(at)gmail(dot)com>
To: Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Ajin Cherian <itsajin(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, shveta malik <shveta(dot)malik(at)gmail(dot)com>
Subject: Re: Synchronizing slots from primary to standby
Date: 2023-12-21 09:36:06
Message-ID: CAJpy0uCaUFpzYVdCsPzNQmJMFaNGEE16wbSwWS-a=+iyX+Y3cQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Dec 20, 2023 at 12:02 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> Here are some comments for the patch v50-0002.

Thank You for the feedback. I have addressed these in v52.

> ======
> GENERAL
>
> (I made a short study of all the ereports in this patch -- here are
> some findings)
>
> ~~~
>
> 0.1 Don't need the parentheses.
>
> Checking all the ereports I see that half of them have the redundant
> parentheses and half of them do not; You might as well make them all
> use the new style where the extra parentheses are not needed.
>
> e.g.
> + ereport(LOG,
> + (errmsg("skipping slot synchronization"),
> + errdetail("enable_syncslot is disabled.")));
>
> e.g.
> + ereport(ERROR,
> + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> + errmsg("cannot drop replication slot \"%s\"", name),
> + errdetail("This slot is being synced from the primary server.")));
>
> and many more like this. Search for all the ereports.
>
> ~~~
>
> 0.2
> + ereport(LOG,
> + (errmsg("dropped replication slot \"%s\" of dbid %d as it "
> + "was not sync-ready", NameStr(s->data.name),
> + s->data.database)));
>
> I felt maybe that could be:
>
> errmsg("dropped replication slot \"%s\" of dbid %d", ...
> errdetail("It was not sync-ready.")
>
> (now this shares the same errmsg with another ereport)
>
> ~~~
>
> 0.3.
> + ereport(ERROR,
> + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> + errmsg("skipping sync of slot \"%s\" as it is a user created"
> + " slot", remote_slot->name),
> + errdetail("This slot has failover enabled on the primary and"
> + " thus is sync candidate but user created slot with"
> + " the same name already exists on the standby.")));
>
> This seemed too wordy. Can't it be shortened (maybe like below)
> without losing any of the vital information?
>
> errmsg("skipping sync of slot \"%s\"", ...)
> errdetail("A user-created slot with the same name already exists on
> the standby.")

I have modified it a little bit more. Please see now. I wanted to add
the info that slot-sync worker is exiting instead of skipping a slot
and that the concerned slot is a failover slot on primary. These were
the other comments around the same.

> ~~~
>
> 0.4
> + ereport(ERROR,
> + (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> + errmsg("exiting from slot synchronization due to bad configuration"),
> + /* translator: second %s is a GUC variable name */
> + errdetail("The primary slot \"%s\" specified by %s is not valid.",
> + PrimarySlotName, "primary_slot_name")));
>
> /The primary slot/The primary server slot/
>
> ~~~
>
> 0.5
> + ereport(ERROR,
> + (errmsg("could not fetch primary_slot_name \"%s\" info from the "
> + "primary: %s", PrimarySlotName, res->err)));
>
> /primary:/primary server:/
>
> ~~~
>
> 0.6
> The continuations for long lines are inconsistent. Sometimes there are
> trailing spaces and sometimes there are leading spaces. And sometimes
> there are both at the same time which would cause double-spacing in
> the message! Please make them all the same. I think using leading
> spaces is easier but YMMV.
>
> e.g.
> + elog(ERROR,
> + "not synchronizing local slot \"%s\" LSN(%X/%X)"
> + " to remote slot's LSN(%X/%X) as synchronization "
> + " would move it backwards", remote_slot->name,
> + LSN_FORMAT_ARGS(slot->data.restart_lsn),
> + LSN_FORMAT_ARGS(remote_slot->restart_lsn));
>
> ======
> src/backend/replication/logical/slotsync.c
>
> 1. check_primary_info
>
> + /* No need to check further, return that we are cascading standby */
> + if (remote_in_recovery)
> + {
> + *am_cascading_standby = true;
> + ExecClearTuple(tupslot);
> + walrcv_clear_result(res);
> + CommitTransactionCommand();
> + return;
> + }
> +
> + valid = DatumGetBool(slot_getattr(tupslot, 2, &isnull));
> + Assert(!isnull);
> +
> + if (!valid)
> + ereport(ERROR,
> + (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> + errmsg("exiting from slot synchronization due to bad configuration"),
> + /* translator: second %s is a GUC variable name */
> + errdetail("The primary slot \"%s\" specified by %s is not valid.",
> + PrimarySlotName, "primary_slot_name")));
> + ExecClearTuple(tupslot);
> + walrcv_clear_result(res);
> + CommitTransactionCommand();
> +}
>
> Now that there is a common cleanup/return code this function be
> reduced further like below:
>
> SUGGESTION
>
> if (remote_in_recovery)
> {
> /* No need to check further, return that we are cascading standby */
> *am_cascading_standby = true;
> }
> else
> {
> /* We are a normal standby. */
>
> valid = DatumGetBool(slot_getattr(tupslot, 2, &isnull));
> Assert(!isnull);
>
> if (!valid)
> ...
> }
>
> ExecClearTuple(tupslot);
> walrcv_clear_result(res);
> CommitTransactionCommand();
> }
>
> ~~~
>
> 2. ReplSlotSyncWorkerMain
>
> + /*
> + * One can promote the standby and we can no longer be a cascading
> + * standby. So recheck here.
> + */
> + if (am_cascading_standby)
> + check_primary_info(wrconn, &am_cascading_standby);
>
> Minor rewording of that new comment.
>
> SUGGESTION
> If the standby was promoted then what was previously a cascading
> standby might no longer be one, so recheck each time.
>
> ======
> src/test/recovery/t/050_verify_slot_order.pl
>
> 3.
> +##################################################
> +# Test that a synchronized slot can not be decoded, altered and
> dropped by the user
> +##################################################
>
> /and dropped/or dropped/
>
> ~~~
>
> 4.
> +
> +($result, $stdout, $stderr) = $standby1->psql(
> + 'postgres',
> + qq[ALTER_REPLICATION_SLOT lsub1_slot (failover);],
> + replication => 'database');
> +ok($stderr =~ /ERROR: cannot alter replication slot "lsub1_slot"/,
> + "synced slot on standby cannot be altered");
> +
>
> Add a comment for this test part
>
> SUGGESTION
> Attempting to alter a synced slot should result in an error
>
> ~~~
>
> 5.
> IMO it would be better if the tests were done in the same order
> mentioned in the comment. So either change the tests or change the
> comment.
>
> ======
> Kind Regards,
> Peter Smith.
> Fujitsu Australia

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bertrand Drouvot 2023-12-21 09:39:10 Re: Synchronizing slots from primary to standby
Previous Message Pavel Stehule 2023-12-21 09:35:47 Re: Autonomous transactions 2023, WIP