RE: Conflict detection for update_deleted in logical replication

From: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: RE: Conflict detection for update_deleted in logical replication
Date: 2024-11-26 08:18:40
Message-ID: OS0PR01MB5716107452FA591272B95E92942F2@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Monday, November 25, 2024 5:50 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Thu, Nov 21, 2024 at 3:03 PM Zhijie Hou (Fujitsu)
> <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> >
> > Attach the V10 patch set which addressed above comments
> > and fixed a CFbot warning due to un-initialized variable.
> >
>
> We should make the v10_2-0001* as the first main patch for review till
> we have a consensus to resolve LSN<->Timestamp inversion issue. This
> is because v10_2 doesn't rely on the correctness of LSN<->Timestamp
> mapping. Now, say in some later release, we fix the LSN<->Timestamp
> inversion issue, we can simply avoid sending remote_xact information
> and it will behave the same as your v10_1 approach.

Agreed.

>
> Comments on v10_2_0001*:
> ======================
> 1.
> +/*
> + * The phases involved in advancing the non-removable transaction ID.
> + *
> + * Refer to maybe_advance_nonremovable_xid() for details on how the
> function
> + * transitions between these phases.
> + */
> +typedef enum
> +{
> + DTR_GET_CANDIDATE_XID,
> + DTR_REQUEST_PUBLISHER_STATUS,
> + DTR_WAIT_FOR_PUBLISHER_STATUS,
> + DTR_WAIT_FOR_LOCAL_FLUSH
> +} DeadTupleRetainPhase;
>
> First, can we have a better name for this enum like
> RetainConflictInfoPhase or something like that? Second, the phase
> transition is not very clear from the comments atop
> maybe_advance_nonremovable_xid. You can refer to comments atop
> tablesync.c or snapbuild.c to see other cases where we have explained
> phase transitions.

I agree that RetainConflictInfoPhase is better so changed as suggested.
And I improve the comments a bit in the V11 patch set.

>
> 2.
> + * Wait for the status from the walsender. After receiving the first status
> + * after acquiring a new candidate transaction ID, do not proceed if there
> + * are ongoing concurrent remote transactions.
>
> In this part of the comments: " .. after acquiring a new candidate
> transaction ID ..." appears misplaced.

Removed.

>
> 3. In maybe_advance_nonremovable_xid(), the handling of each phase
> looks ad-hoc though I see that you have done that have so that you can
> handle the phase change functionality after changing the phase
> immediately. If we have to ever extend this functionality, it will be
> tricky to handle the new phase or at least the code will become
> complicated. How about handling each phase in the order of their
> occurrence and having separate functions for each phase as we have in
> apply_dispatch()? That way it would be convenient to invoke the phase
> handling functionality even if it needs to be called multiple times in
> the same function.

Agreed, I have split them into different functions.

>
> 4.
> /*
> + * An invalid position indiates the publisher is also
> + * a physical standby. In this scenario, advancing the
> + * non-removable transaction ID is not supported. This
> + * is because the logical walsender on the standby can
> + * only get the WAL replay position but there may be
> + * more WALs that are being replicated from the
> + * primary and those WALs could have earlier commit
> + * timestamp. Refer to
> + * maybe_advance_nonremovable_xid() for details.
> + */
> + if (XLogRecPtrIsInvalid(remote_lsn))
> + {
> + ereport(WARNING,
> + errmsg("cannot get the latest WAL position from the publisher"),
> + errdetail("The connected publisher is also a standby server."));
> +
> + /*
> + * Continuously revert to the request phase until
> + * the standby server (publisher) is promoted, at
> + * which point a valid WAL position will be
> + * received.
> + */
> + phase = DTR_REQUEST_PUBLISHER_STATUS;
> + }
>
> Shouldn't this be an ERROR as the patch doesn't support this case? The
> same should be true for later patches for the subscription option.

Yes, I changed the log level to ERROR in V11 patch set.

In V11, in addition to addressing above comments, I improved the speed of
advancing the non-removable transaction ID a bit:

In V10_2, the apply worker needs to wait for all the running transactions To
finish before advancing the non-removable transaction ID. That could delay the
advancement if there are long-running transactions on the publisher. I have
improved it to only wait for the transactions that might be in commit phase on
the publisher. I used the condition (DELAY_CHKPT_START & delayChkptFlags) to
determine whether a txn is in commit phase. Before f21bb9cfb564, that flag is
used for inCommit transactions as well which is similar to our situation. Ofc,
We could also add a new flag if necessary.

Also, I added a warning and comments for the case when
The clock on the publisher is behind that of the subscriber, as dead tuples
could be removed prematurely in this case.

Best Regards,
Hou zj

Attachment Content-Type Size
v11-0005-Add-a-tap-test-to-verify-the-new-slot-xmin-mecha.patch application/octet-stream 7.5 KB
v11-0001-Maintain-the-oldest-non-removeable-tranasction-I.patch application/octet-stream 28.8 KB
v11-0002-Maintain-the-replication-slot-in-logical-launche.patch application/octet-stream 11.9 KB
v11-0003-Support-the-conflict-detection-for-update_delete.patch application/octet-stream 21.3 KB
v11-0004-Add-a-detect_update_deleted-option-to-subscripti.patch application/octet-stream 71.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2024-11-26 09:12:06 Re: SQL:2023 JSON simplified accessor support
Previous Message Shubham Khanna 2024-11-26 08:07:31 Re: Improve the error message for logical replication of regular column to generated column.