RE: Conflict detection for update_deleted in logical replication

From: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: shveta malik <shveta(dot)malik(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: RE: Conflict detection for update_deleted in logical replication
Date: 2024-09-20 03:59:07
Message-ID: OS0PR01MB5716662BEB9C0B4E92587FAC946C2@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Friday, September 20, 2024 10:55 AM Zhijie Hou (Fujitsu) <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> On Friday, September 20, 2024 2:49 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> >
> > I think that such a time-based configuration parameter would be a
> > reasonable solution. The current concerns are that it might affect
> > vacuum performance and lead to a similar bug we had with
> vacuum_defer_cleanup_age.
>
> Thanks for the feedback!
>
> I am working on the POC patch and doing some initial performance tests on
> this idea.
> I will share the results after finishing.
>
> Apart from the vacuum_defer_cleanup_age idea. we’ve given more thought to
> our approach for retaining dead tuples and have come up with another idea that
> can reliably detect conflicts without requiring users to choose a wise value for
> the vacuum_committs_age. This new idea could also reduce the performance
> impact. Thanks a lot to Amit for off-list discussion.
>
> The concept of the new idea is that, the dead tuples are only useful to detect
> conflicts when applying *concurrent* transactions from remotes. Any
> subsequent UPDATE from a remote node after removing the dead tuples
> should have a later timestamp, meaning it's reasonable to detect an
> update_missing scenario and convert the UPDATE to an INSERT when
> applying it.
>
> To achieve above, we can create an additional replication slot on the subscriber
> side, maintained by the apply worker. This slot is used to retain the dead tuples.
> The apply worker will advance the slot.xmin after confirming that all the
> concurrent transaction on publisher has been applied locally.
>
> The process of advancing the slot.xmin could be:
>
> 1) the apply worker call GetRunningTransactionData() to get the
> 'oldestRunningXid' and consider this as 'candidate_xmin'.
> 2) the apply worker send a new message to walsender to request the latest wal
> flush position(GetFlushRecPtr) on publisher, and save it to
> 'candidate_remote_wal_lsn'. Here we could introduce a new feedback
> message or extend the existing keepalive message(e,g extends the
> requestReply bit in keepalive message to add a 'request_wal_position' value)
> 3) The apply worker can continue to apply changes. After applying all the WALs
> upto 'candidate_remote_wal_lsn', the apply worker can then advance the
> slot.xmin to 'candidate_xmin'.
>
> This approach ensures that dead tuples are not removed until all concurrent
> transactions have been applied. It can be effective for both bidirectional and
> non-bidirectional replication cases.
>
> We could introduce a boolean subscription option (retain_dead_tuples) to
> control whether this feature is enabled. Each subscription intending to detect
> update-delete conflicts should set retain_dead_tuples to true.
>
> The following explains how it works in different cases to achieve data
> consistency:
...
> --
> 3 nodes, non-bidirectional, Node C subscribes to both Node A and Node B:
> --

Sorry for a typo here, the time of T2 and T3 were reversed.
Please see the following correction:

>
> Node A:
> T1: INSERT INTO t (id, value) VALUES (1,1); ts=10.00 AM
> T2: DELETE FROM t WHERE id = 1; ts=10.01 AM

Here T2 should be at ts=10.02 AM

>
> Node B:
> T3: UPDATE t SET value = 2 WHERE id = 1; ts=10.02 AM

T3 should be at ts=10.01 AM

>
> Node C:
> apply T1, T2, T3
>
> After applying T2, the apply worker on Node C will check the latest wal flush
> location on Node B. Till that time, the T3 should have finished, so the xmin will
> be advanced only after applying the WALs that is later than T3. So, the dead
> tuple will not be removed before applying the T3, which means the
> update_delete can be detected.
>
> Your feedback on this idea would be greatly appreciated.
>

Best Regards,
Hou zj

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Lakhin 2024-09-20 04:00:00 Re: Large expressions in indexes can't be stored (non-TOASTable)
Previous Message Michael Paquier 2024-09-20 03:53:43 Re: per backend I/O statistics