Re: Conflict detection for update_deleted in logical replication

From: shveta malik <shveta(dot)malik(at)gmail(dot)com>
To: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, shveta malik <shveta(dot)malik(at)gmail(dot)com>
Subject: Re: Conflict detection for update_deleted in logical replication
Date: 2024-09-10 09:55:58
Message-ID: CAJpy0uCHkq_DPjNAPEyqUkAQNZtS443CLHTFPrPzAj-Scby_MQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Sep 10, 2024 at 1:40 PM Zhijie Hou (Fujitsu)
<houzj(dot)fnst(at)fujitsu(dot)com> wrote:
>
> On Tuesday, September 10, 2024 2:45 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
> > > ---
> > > THE DESIGN
> > > ---
> > >
> > > To achieve the above, we plan to allow the logical walsender to
> > > maintain and advance the slot.xmin to protect the data in the user
> > > table and introduce a new logical standby feedback message. This
> > > message reports the WAL position that has been replayed on the logical
> > > standby *AND* the changes occurring on the logical standby before the
> > > WAL position are also replayed to the walsender's node (where the
> > > walsender is running). After receiving the new feedback message, the
> > > walsender will advance the slot.xmin based on the flush info, similar
> > > to the advancement of catalog_xmin. Currently, the effective_xmin/xmin
> > > of logical slot are unused during logical replication, so I think it's safe and
> > won't cause side-effect to reuse the xmin for this feature.
> > >
> > > We have introduced a new subscription option
> > > (feedback_slots='slot1,...'), where these slots will be used to check
> > > condition (b): the transactions on logical standbys occurring before
> > > the replay of Node A's DELETE are replayed on Node A as well.
> > > Therefore, on Node B, users should specify the slots corresponding to
> > > Node A in this option. The apply worker will get the oldest confirmed
> > > flush LSN among the specified slots and send the LSN as a feedback
> > message to the walsender. -- I also thought of making it an automaic way, e.g.
> > > let apply worker select the slots that acquired by the walsenders
> > > which connect to the same remote server(e.g. if apply worker's
> > > connection info or some other flags is same as the walsender's
> > > connection info). But it seems tricky because if some slots are
> > > inactive which means the walsenders are not there, the apply worker
> > > could not find the correct slots to check unless we save the host along with
> > the slot's persistence data.
> > >
> > > The new feedback message is sent only if feedback_slots is not NULL.
> > > If the slots in feedback_slots are removed, a final message containing
> > > InvalidXLogRecPtr will be sent to inform the walsender to forget about
> > > the slot.xmin.
> > >
> > > To detect update_deleted conflicts during update operations, if the
> > > target row cannot be found, we perform an additional scan of the table using
> > snapshotAny.
> > > This scan aims to locate the most recently deleted row that matches
> > > the old column values from the remote update operation and has not yet
> > > been removed by VACUUM. If any such tuples are found, we report the
> > > update_deleted conflict along with the origin and transaction information
> > that deleted the tuple.
> > >
> > > Please refer to the attached POC patch set which implements above
> > > design. The patch set is split into some parts to make it easier for the initial
> > review.
> > > Please note that each patch is interdependent and cannot work
> > independently.
> > >
> > > Thanks a lot to Kuroda-San and Amit for the off-list discussion.
> > >
> > > Suggestions and comments are highly appreciated !
> > >
> >
> > Thank You Hou-San for explaining the design. But to make it easier to
> > understand, would you be able to explain the sequence/timeline of the
> > *new* actions performed by the walsender and the apply processes for the
> > given example along with new feedback_slot config needed
> >
> > Node A: (Procs: walsenderA, applyA)
> > T1: INSERT INTO t (id, value) VALUES (1,1); ts=10.00 AM
> > T2: DELETE FROM t WHERE id = 1; ts=10.02 AM
> >
> > Node B: (Procs: walsenderB, applyB)
> > T3: UPDATE t SET value = 2 WHERE id = 1; ts=10.01 AM
>
> Thanks for reviewing! Let me elaborate further on the example:
>
> On node A, feedback_slots should include the logical slot that used to replicate changes
> from Node A to Node B. On node B, feedback_slots should include the logical
> slot that replicate changes from Node B to Node A.
>
> Assume the slot.xmin on Node A has been initialized to a valid number(740) before the
> following flow:
>
> Node A executed T1 - 10.00 AM
> T1 replicated and applied on Node B - 10.0001 AM
> Node B executed T3 - 10.01 AM
> Node A executed T2 (741) - 10.02 AM
> T2 replicated and applied on Node B (delete_missing) - 10.03 AM

Not related to this feature, but do you mean delete_origin_differ here?

> T3 replicated and applied on Node A (new action, detect update_deleted) - 10.04 AM
>
> (new action) Apply worker on Node B has confirmed that T2 has been applied
> locally and the transactions before T2 (e.g., T3) has been replicated and
> applied to Node A (e.g. feedback_slot.confirmed_flush_lsn >= lsn of the local
> replayed T2), thus send the new feedback message to Node A. - 10.05 AM
>
> (new action) Walsender on Node A received the message and would advance the slot.xmin.- 10.06 AM
>
> Then, after the slot.xmin is advanced to a number greater than 741, the VACUUM would be able to
> remove the dead tuple on Node A.
>

Thanks for the example. Can you please review below and let me know if
my understanding is correct.

1)
In a bidirectional replication setup, the user has to create slots in
a way that NodeA's sub's slot is Node B's feedback_slot and Node B's
sub's slot is Node A's feedback slot. And then only this feature will
work well, is it correct to say?

2)
Now coming back to multiple feedback_slots in a subscription, is the
below correct:

Say Node A has publications and subscriptions as follow:
------------------
A_pub1

A_sub1 (subscribing to B_pub1 with the default slot_name of A_sub1)
A_sub2 (subscribing to B_pub2 with the default slot_name of A_sub2)
A_sub3 (subscribing to B_pub3 with the default slot_name of A_sub3)

Say Node B has publications and subscriptions as follow:
------------------
B_sub1 (subscribing to A_pub1 with the default slot_name of B_sub1)

B_pub1
B_pub2
B_pub3

Then what will be the feedback_slot configuration for all
subscriptions of A and B? Is below correct:
------------------
A_sub1, A_sub2, A_sub3: feedback_slots=B_sub1
B_sub1: feedback_slots=A_sub1,A_sub2, A_sub3

3)
If the above is true, then do we have a way to make sure that the user
has given this configuration exactly the above way? If users end up
giving feedback_slots as some random slot (say A_slot4 or incomplete
list), do we validate that? (I have not looked at code yet, just
trying to understand design first).

4)
Now coming to this:

> The apply worker will get the oldest
> confirmed flush LSN among the specified slots and send the LSN as a feedback
> message to the walsender.

There will be one apply worker on B which will be due to B_sub1, so
will it check confirmed_lsn of all slots A_sub1,A_sub2, A_sub3? Won't
it be sufficient to check confimed_lsn of say slot A_sub1 alone which
has subscribed to table 't' on which delete has been performed? Rest
of the lots (A_sub2, A_sub3) might have subscribed to different
tables?

thanks
Shveta

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2024-09-10 09:57:21 Re: Speeding up ruleutils' name de-duplication code, redux
Previous Message Magnus Hagander 2024-09-10 09:31:45 Re: Jargon and acronyms on this mailing list