RE: Conflict detection for update_deleted in logical replication

From: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: RE: Conflict detection for update_deleted in logical replication
Date: 2024-10-02 02:39:21
Message-ID: OS0PR01MB5716DAA010A3CA8BF6FDBD1F94702@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tuesday, October 1, 2024 8:44 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Mon, Sep 30, 2024 at 12:02 PM Zhijie Hou (Fujitsu)
> <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> >
> > On Wednesday, September 25, 2024 2:23 AM Masahiko Sawada
> <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > >
> > > I think the remote wal flush location is asked using a replication protocol.
> > > Therefore, if a new worker is responsible for asking wal flush
> > > location from multiple publishers (like the idea (b)), the
> > > corresponding process would need to be launched on publisher sides
> > > and logical replication would also need to start on each connection.
> > > I think it would be better to get the remote wal flush location
> > > using the existing logical replication connection (i.e., between the
> > > logical wal sender and the apply worker), and advertise the
> > > locations on the shared memory. Then, the central process who holds the
> slot to retain the deleted row versions traverses them and increases slot.xmin if
> possible.
> > >
> > > The cost of requesting the remote wal flush location would not be
> > > huge if we don't ask it very frequently. So probably we can start by
> > > having each apply worker (in the retain_sub_list) ask the remote wal
> > > flush location and can leave the optimization of avoiding sending the
> request for the same publisher.
> >
> > Agreed. Here is the POC patch set based on this idea.
> >
> > The implementation is as follows:
> >
> > A subscription option is added to allow users to specify whether dead
> > tuples on the subscriber, which are useful for detecting
> > update_deleted conflicts, should be retained. The default setting is
> > false. If set to true, the detection of update_deleted will be
> > enabled,
> >
>
> I find the option name retain_dead_tuples bit misleading because by name one
> can't make out the purpose of the same. It is better to name it as
> detect_update_deleted or something on those lines.
>
> > and an additional replication
> > slot named pg_conflict_detection will be created on the subscriber to
> > prevent dead tuples from being removed. Note that if multiple
> > subscriptions on one node enable this option, only one replication slot will be
> created.
> >
>
> In general, we should have done this by default but as detecting
> update_deleted type conflict has some overhead in terms of retaining dead
> tuples for more time, so having an option seems reasonable. But I suggest to
> keep this as a separate last patch. If we can make the core idea work by default
> then we can enable it via option in the end.

Thanks for the comments. I have renamed the option to detect_update_deleted and
make it a separate last patch in the V3 patch set.

Best Regards,
Hou zj

Attachment Content-Type Size
v3-0005-Add-a-tap-test-to-verify-the-new-slot-xmin-mechan.patch application/octet-stream 6.8 KB
v3-0001-Maintain-the-oldest-non-removeable-tranasction-id.patch application/octet-stream 16.9 KB
v3-0002-Maintain-the-replication-slot-in-logical-launcher.patch application/octet-stream 8.5 KB
v3-0003-Support-the-conflict-detection-for-update_deleted.patch application/octet-stream 21.1 KB
v3-0004-Add-a-detect_update_deleted-option-to-subscriptio.patch application/octet-stream 65.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Lakhin 2024-10-02 03:00:00 Re: query_id, pg_stat_activity, extended query protocol
Previous Message Thomas Munro 2024-10-02 02:36:35 Re: Requiring LLVM 14+ in PostgreSQL 18