Re: Conflict detection for update_deleted in logical replication

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Conflict detection for update_deleted in logical replication
Date: 2024-09-24 18:22:36
Message-ID: CAD21AoB4cJ-=e7J=i1-MKHoRoApJvJzknzmUFa=0SJ+a8ekKrg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Sep 20, 2024 at 2:46 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Fri, Sep 20, 2024 at 8:25 AM Zhijie Hou (Fujitsu)
> <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> >
> > Apart from the vacuum_defer_cleanup_age idea.
> >
>
> I think you meant to say vacuum_committs_age idea.
>
> > we’ve given more thought to our
> > approach for retaining dead tuples and have come up with another idea that can
> > reliably detect conflicts without requiring users to choose a wise value for
> > the vacuum_committs_age. This new idea could also reduce the performance
> > impact. Thanks a lot to Amit for off-list discussion.
> >
> > The concept of the new idea is that, the dead tuples are only useful to detect
> > conflicts when applying *concurrent* transactions from remotes. Any subsequent
> > UPDATE from a remote node after removing the dead tuples should have a later
> > timestamp, meaning it's reasonable to detect an update_missing scenario and
> > convert the UPDATE to an INSERT when applying it.
> >
> > To achieve above, we can create an additional replication slot on the
> > subscriber side, maintained by the apply worker. This slot is used to retain
> > the dead tuples. The apply worker will advance the slot.xmin after confirming
> > that all the concurrent transaction on publisher has been applied locally.
> >
> > The process of advancing the slot.xmin could be:
> >
> > 1) the apply worker call GetRunningTransactionData() to get the
> > 'oldestRunningXid' and consider this as 'candidate_xmin'.
> > 2) the apply worker send a new message to walsender to request the latest wal
> > flush position(GetFlushRecPtr) on publisher, and save it to
> > 'candidate_remote_wal_lsn'. Here we could introduce a new feedback message or
> > extend the existing keepalive message(e,g extends the requestReply bit in
> > keepalive message to add a 'request_wal_position' value)
> > 3) The apply worker can continue to apply changes. After applying all the WALs
> > upto 'candidate_remote_wal_lsn', the apply worker can then advance the
> > slot.xmin to 'candidate_xmin'.
> >
> > This approach ensures that dead tuples are not removed until all concurrent
> > transactions have been applied. It can be effective for both bidirectional and
> > non-bidirectional replication cases.
> >
> > We could introduce a boolean subscription option (retain_dead_tuples) to
> > control whether this feature is enabled. Each subscription intending to detect
> > update-delete conflicts should set retain_dead_tuples to true.
> >
>
> As each apply worker needs a separate slot to retain deleted rows, the
> requirement for slots will increase. The other possibility is to
> maintain one slot by launcher or some other central process that
> traverses all subscriptions, remember the ones marked with
> retain_dead_rows (let's call this list as retain_sub_list). Then using
> running_transactions get the oldest running_xact, and then get the
> remote flush location from the other node (publisher node) and store
> those as candidate values (candidate_xmin and
> candidate_remote_wal_lsn) in slot. We can probably reuse existing
> candidate variables of the slot. Next, we can check the remote_flush
> locations from all the origins corresponding in retain_sub_list and if
> all are ahead of candidate_remote_wal_lsn, we can update the slot's
> xmin to candidate_xmin.

Yeah, I think that such an idea to reduce the number required slots
would be necessary.

>
> I think in the above idea we can an optimization to combine the
> request for remote wal LSN from different subscriptions pointing to
> the same node to avoid sending multiple requests to the same node. I
> am not sure if using pg_subscription.subconninfo is sufficient for
> this, if not we can probably leave this optimization.
>
> If this idea is feasible then it would reduce the number of slots
> required to retain the deleted rows but the launcher needs to get the
> remote wal location corresponding to each publisher node. There are
> two ways to achieve that (a) launcher requests one of the apply
> workers corresponding to subscriptions pointing to the same publisher
> node to get this information; (b) launcher launches another worker to
> get the remote wal flush location.

I think the remote wal flush location is asked using a replication
protocol. Therefore, if a new worker is responsible for asking wal
flush location from multiple publishers (like the idea (b)), the
corresponding process would need to be launched on publisher sides and
logical replication would also need to start on each connection. I
think it would be better to get the remote wal flush location using
the existing logical replication connection (i.e., between the logical
wal sender and the apply worker), and advertise the locations on the
shared memory. Then, the central process who holds the slot to retain
the deleted row versions traverses them and increases slot.xmin if
possible.

The cost of requesting the remote wal flush location would not be huge
if we don't ask it very frequently. So probably we can start by having
each apply worker (in the retain_sub_list) ask the remote wal flush
location and can leave the optimization of avoiding sending the
request for the same publisher.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dave Cramer 2024-09-24 18:31:10 Re: [PATCH] Add native windows on arm64 support
Previous Message Tom Lane 2024-09-24 18:21:04 Re: Proposal to Enable/Disable Index using ALTER INDEX