Quick Links

RE: Fix slot synchronization with two_phase decoding enabled

From:	"Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
To:	shveta malik <shveta(dot)malik(at)gmail(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>
Subject:	RE: Fix slot synchronization with two_phase decoding enabled
Date:	2025-04-28 11:33:12
Message-ID:	OS0PR01MB571616DB432287BA89A79A1694812@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Mon, Apr 28, 2025 at 5:33 PM shveta malik wrote:
>
> On Mon, Apr 28, 2025 at 2:27 PM Zhijie Hou (Fujitsu)
> <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> >
> > On Fri, Apr 18, 2025 at 12:29 PM Amit Kapila wrote:
> > >
> > > On Thu, Apr 17, 2025 at 6:14 PM Zhijie Hou (Fujitsu)
> > > <houzj(dot)fnst(at)fujitsu(dot)com>
> > > >
> > > > -----
> > > > Fix
> > > > -----
> > > >
> > > > I think we should keep the confirmed_flush even if the previous
> > > > synced restart_lsn/catalog_xmin is newer. Attachments include a
> > > > patch for the
> > > same.
> > > >
> > >
> > > This will fix the case we are facing but adds a new rule for slot
> synchronization.
> > > Can we think of a simpler way to fix this by avoiding updating other
> > > slot fields (like two_phase, two_phase_at) if restart_lsn or
> > > catalog_xmin of the local slot is ahead of the remote slot?
> >
> > Since the fix for xmin advancement during fast-forward decoding has
> > been pushed (commit aaf9e95), I'm attaching the V2 patch to address
> > the assert failure by avoiding updating other slot fields (like
> > two_phase, two_phase_at) if restart_lsn or catalog_xmin of the local slot is
> ahead.
> >
>
> The patch looks good to me. We can have minor comment tweak:
>
> + * Syncing only two_phase_at, without also syncing the latest
> + * confirmed_lsn, might lead to transactions between the old
> + * confirmed_lsn and two_phase_at being unexpectedly decoded and sent
> + * to the subscriber.
>
> We can append "following a promotion".

Thanks for reviewing. Here is V3 patch that addressed it.

BTW, I also did some tests to confirm the catalog_xmin could
still be ahead in some case, and here is an example:

1. Create a failover replication slot named 'logicalslot' on primary
and acquire it in the walsender.

2. Log two standby snapshots on primary. Before logging, call txid_current()
To assign a xid, so that each standby snapshot will hold a new xid in its oldestrunningXid field:
- txid_current();
- `0/3000420` - `running_xacts` (no running transactions, oldestrunningXid = 755)
- txid_current();
- `0/3000488` - `running_xacts` (no running transactions, oldestrunningXid = 756)

3. The walsender sets `0/3000420` as the `candidate_restart_lsn`, 755 as
`candidate_catalog_xmin`, skipping the second `running_xacts` because
`candidate_restart_lsn`/`candidate_catalog_xmin` is already valid.

4. After receiving a reply from the apply worker, the walsender assigns
`0/3000420` to `restart_lsn`, `755` to `catalog_xmin`. At this point, the
replication slot 'logicalslot' has `restart_lsn` set to `0/3000420`,
`catalog_xmin` set to `755`.

5. On the standby, execute `pg_sync_replication_slots()` to synchronize
'logicalslot'.

6. During synchronization, with the initial `restart_lsn` at `0/3000420`, the
sync slot reaches a consistent point at this position. As a result, it does
not update `candidate_restart_lsn` and `candidate_catalog_xmin` at
consistent point (refer to `SnapBuildProcessRunningXacts()`).

7. The sync process identifies the second standby snapshot at `0/3000488` and
uses its LSN as `candidate_restart_lsn`, and use the oldestrunningXid `756`
as `candidate_catalog_xmin`, eventually updating it to `restart_lsn` and
`catalog_xmin`.

8. Now, the synced slot holds `restart_lsn` at `0/3000488`, `catalog_xmin` at
`756`, which are all ahead of the remote slot on the primary server.

Attaching a script to reproduce the same.

Note that, to reproduce this stably, we'd better modify the value of
LOG_SNAPSHOT_INTERVAL_MS in bgwriter.c to a bigger number to avoid
unexpected xl_running_xacts logging.

Best Regards,
Hou zj

Attachment	Content-Type	Size
advance_catalog_xmin.sh	application/octet-stream	3.8 KB
v3-0001-Fix-assertion-failure-when-decoding-synced-two-ph.patch	application/octet-stream	4.0 KB

In response to

Re: Fix slot synchronization with two_phase decoding enabled at 2025-04-28 09:33:05 from shveta malik

Responses

Re: Fix slot synchronization with two_phase decoding enabled at 2025-04-28 21:31:21 from Masahiko Sawada
RE: Fix slot synchronization with two_phase decoding enabled at 2025-04-29 06:54:04 from Zhijie Hou (Fujitsu)

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Peter Eisentraut	2025-04-28 11:55:48	Re: allow changing autovacuum_max_workers without restarting
Previous Message	Shayon Mukherjee	2025-04-28 11:23:15	Re: [PATCH] Re: Proposal to Enable/Disable Index using ALTER INDEX