RE: Conflict detection for update_deleted in logical replication

From: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: shveta malik <shveta(dot)malik(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: RE: Conflict detection for update_deleted in logical replication
Date: 2024-09-30 06:32:42
Message-ID: OS0PR01MB571601E01C57AF6BE0DE84E694762@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wednesday, September 25, 2024 2:23 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Fri, Sep 20, 2024 at 2:46 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Fri, Sep 20, 2024 at 8:25 AM Zhijie Hou (Fujitsu)
> > <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> > >
> > > Apart from the vacuum_defer_cleanup_age idea.
> > >
> >
> > I think you meant to say vacuum_committs_age idea.
> >
> > > we’ve given more thought to our
> > > approach for retaining dead tuples and have come up with another
> > > idea that can reliably detect conflicts without requiring users to
> > > choose a wise value for the vacuum_committs_age. This new idea could
> > > also reduce the performance impact. Thanks a lot to Amit for off-list
> discussion.
> > >
> > > The concept of the new idea is that, the dead tuples are only useful
> > > to detect conflicts when applying *concurrent* transactions from
> > > remotes. Any subsequent UPDATE from a remote node after removing the
> > > dead tuples should have a later timestamp, meaning it's reasonable
> > > to detect an update_missing scenario and convert the UPDATE to an
> INSERT when applying it.
> > >
> > > To achieve above, we can create an additional replication slot on
> > > the subscriber side, maintained by the apply worker. This slot is
> > > used to retain the dead tuples. The apply worker will advance the
> > > slot.xmin after confirming that all the concurrent transaction on publisher
> has been applied locally.
> > >
> > > The process of advancing the slot.xmin could be:
> > >
> > > 1) the apply worker call GetRunningTransactionData() to get the
> > > 'oldestRunningXid' and consider this as 'candidate_xmin'.
> > > 2) the apply worker send a new message to walsender to request the
> > > latest wal flush position(GetFlushRecPtr) on publisher, and save it
> > > to 'candidate_remote_wal_lsn'. Here we could introduce a new
> > > feedback message or extend the existing keepalive message(e,g
> > > extends the requestReply bit in keepalive message to add a
> > > 'request_wal_position' value)
> > > 3) The apply worker can continue to apply changes. After applying
> > > all the WALs upto 'candidate_remote_wal_lsn', the apply worker can
> > > then advance the slot.xmin to 'candidate_xmin'.
> > >
> > > This approach ensures that dead tuples are not removed until all
> > > concurrent transactions have been applied. It can be effective for
> > > both bidirectional and non-bidirectional replication cases.
> > >
> > > We could introduce a boolean subscription option
> > > (retain_dead_tuples) to control whether this feature is enabled.
> > > Each subscription intending to detect update-delete conflicts should set
> retain_dead_tuples to true.
> > >
> >
> > As each apply worker needs a separate slot to retain deleted rows, the
> > requirement for slots will increase. The other possibility is to
> > maintain one slot by launcher or some other central process that
> > traverses all subscriptions, remember the ones marked with
> > retain_dead_rows (let's call this list as retain_sub_list). Then using
> > running_transactions get the oldest running_xact, and then get the
> > remote flush location from the other node (publisher node) and store
> > those as candidate values (candidate_xmin and
> > candidate_remote_wal_lsn) in slot. We can probably reuse existing
> > candidate variables of the slot. Next, we can check the remote_flush
> > locations from all the origins corresponding in retain_sub_list and if
> > all are ahead of candidate_remote_wal_lsn, we can update the slot's
> > xmin to candidate_xmin.
>
> Yeah, I think that such an idea to reduce the number required slots would be
> necessary.
>
> >
> > I think in the above idea we can an optimization to combine the
> > request for remote wal LSN from different subscriptions pointing to
> > the same node to avoid sending multiple requests to the same node. I
> > am not sure if using pg_subscription.subconninfo is sufficient for
> > this, if not we can probably leave this optimization.
> >
> > If this idea is feasible then it would reduce the number of slots
> > required to retain the deleted rows but the launcher needs to get the
> > remote wal location corresponding to each publisher node. There are
> > two ways to achieve that (a) launcher requests one of the apply
> > workers corresponding to subscriptions pointing to the same publisher
> > node to get this information; (b) launcher launches another worker to
> > get the remote wal flush location.
>
> I think the remote wal flush location is asked using a replication protocol.
> Therefore, if a new worker is responsible for asking wal flush location from
> multiple publishers (like the idea (b)), the corresponding process would need
> to be launched on publisher sides and logical replication would also need to
> start on each connection. I think it would be better to get the remote wal flush
> location using the existing logical replication connection (i.e., between the
> logical wal sender and the apply worker), and advertise the locations on the
> shared memory. Then, the central process who holds the slot to retain the
> deleted row versions traverses them and increases slot.xmin if possible.
>
> The cost of requesting the remote wal flush location would not be huge if we
> don't ask it very frequently. So probably we can start by having each apply
> worker (in the retain_sub_list) ask the remote wal flush location and can leave
> the optimization of avoiding sending the request for the same publisher.

Agreed. Here is the POC patch set based on this idea.

The implementation is as follows:

A subscription option is added to allow users to specify whether dead
tuples on the subscriber, which are useful for detecting update_deleted
conflicts, should be retained. The default setting is false. If set to true,
the detection of update_deleted will be enabled, and an additional replication
slot named pg_conflict_detection will be created on the subscriber to prevent
dead tuples from being removed. Note that if multiple subscriptions on one node
enable this option, only one replication slot will be created.

This additional slot will be used to retain dead tuples. Each apply worker will
maintain its own non-removable transaction ID by following the steps:

1) Calling GetRunningTransactionData() to take oldestRunningXid as the
candidate xid and send a new message to request the remote WAL position from
the walsender.
2) It then waits (non-blocking) to receive the WAL position from the walsender.
3) After receiving the WAL position, the non-removable transaction ID is
advanced if the current flush location has reached or surpassed the received
WAL position.

These steps are repeated at intervals defined by wal_receiver_status_interval
to minimize performance impact. It ensures that dead tuples are not
removed until all concurrent transactions have been applied.

The launcher periodically collects the oldest_nonremovable_xid from all apply
workers. It then computes the minimum transaction ID and advances the xmin
value of the replication slot if it precedes the computed value.

I will keep testing the patch internally and analyze whether it's necessary to enable
failover for this new replication slot.

Please refer to the commit message of V2-0001 for the overall design.
The patch set is split into some parts to make it easier for the initial
review. Please note that each patch is interdependent and cannot work
independently.

Best Regards,
Hou zj

Attachment Content-Type Size
v2-0005-Add-a-tap-test-to-verify-the-new-slot-xmin-mechan.patch application/octet-stream 6.8 KB
v2-0002-Maintain-the-oldest-non-removeable-tranasction-id.patch application/octet-stream 14.8 KB
v2-0001-add-a-retain_dead_tuples-option-to-subscriptions.patch application/octet-stream 63.0 KB
v2-0003-Maintain-the-replication-slot-in-logical-launcher.patch application/octet-stream 9.2 KB
v2-0004-Support-the-conflict-detection-for-update_deleted.patch application/octet-stream 21.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tatsuo Ishii 2024-09-30 06:34:04 Doc: typo in config.sgml
Previous Message shveta malik 2024-09-30 06:29:04 Re: Conflict Detection and Resolution