| From: | shveta malik <shveta(dot)malik(at)gmail(dot)com> |
|---|---|
| To: | Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com> |
| Cc: | "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Ajin Cherian <itsajin(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, shveta malik <shveta(dot)malik(at)gmail(dot)com> |
| Subject: | Re: Synchronizing slots from primary to standby |
| Date: | 2023-12-26 08:29:19 |
| Message-ID: | CAJpy0uC6t6hZVrkDM9RErCie2-rM7EETqx+AHcjAVKiB1JzYQA@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Fri, Dec 22, 2023 at 7:59 PM Bertrand Drouvot
<bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
>
> Hi,
>
> On Fri, Dec 22, 2023 at 04:02:21PM +0530, shveta malik wrote:
> > PFA v53. Changes are:
>
> Thanks!
>
> > patch002:
> > 2) Addressed comments in [2] for v52-002.
> > 3) Fixed CFBot failure. The failure was caused by an assert in
> > wait_for_primary_slot_catchup() for null confirmed_lsn received. In
> > wait_for_primary_slot_catchup(), we had an assumption that if
> > restart_lsn is valid and 'conflicting' is also false, then we must
> > have non-null confirmed_lsn. But this is not true. It is possible to
> > get null values for confirmed_lsn and catalog_xmin if on the primary
> > server the slot is just created with a valid restart_lsn and slot-sync
> > worker has fetched the slot before the primary server could set valid
> > confirmed_lsn and catalog_xmin. In
> > pg_create_logical_replication_slot(), there is a small window between
> > CreateInitDecodingContext-->ReplicationSlotReserveWal() which sets
> > restart_lsn and DecodingContextFindStartpoint() which sets
> > confirmed_lsn. If the slot-sync worker fetches the slot in this
> > window, confirmed_lsn received will be NULL. Corrected the code to
> > remove assert and added one additional condition that confirmed_lsn
> > should be valid before moving the slot to 'r'.
> >
>
> Looking at v53-0002 commit message:
>
> It states:
>
> "
> If a logical slot on the primary is valid but is invalidated on the standby,
> then that slot is dropped and recreated on the standby in next sync-cycle.
> "
>
> and one of the reasons mentioned is:
>
> "
> - The primary changes wal_level to a level lower than logical.
> "
>
> I think that as long at there is still logical replication slot on the primary
> that should not be possible. The primary should fail to start with messages like:
>
> "
> 2023-12-22 14:06:09.281 UTC [31824] FATAL: logical replication slot "logical_slot" exists, but wal_level < logical
> "
Yes, right. It fails in such a case.
>
> Now, if:
>
> - The standby is shutdown
> - All the logical replication slots are removed on the primary
> - wal_level is set to < logical on the primary and it is restarted
>
> Then when the standby starts, the "synced" slots will be invalidated and later
> removed but not re-created on the next sync-cycle (because they don't exist
> anymore on the primary).
>
> Worth to reword a bit that part?
yes, will change these details. Thanks!
> Regards,
>
> --
> Bertrand Drouvot
> PostgreSQL Contributors Team
> RDS Open Source Databases
> Amazon Web Services: https://aws.amazon.com
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Amit Kapila | 2023-12-26 08:49:04 | Re: Synchronizing slots from primary to standby |
| Previous Message | Nazir Bilal Yavuz | 2023-12-26 08:27:16 | Re: Show WAL write and fsync stats in pg_stat_io |