Re: Conflict detection for update_deleted in logical replication

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Conflict detection for update_deleted in logical replication
Date: 2024-10-01 12:43:33
Message-ID: CAA4eK1J=eKf_-AObSdXp7xurjTQqA62Ls7ZJjajcz0wkE4DkQQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Sep 30, 2024 at 12:02 PM Zhijie Hou (Fujitsu)
<houzj(dot)fnst(at)fujitsu(dot)com> wrote:
>
> On Wednesday, September 25, 2024 2:23 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > I think the remote wal flush location is asked using a replication protocol.
> > Therefore, if a new worker is responsible for asking wal flush location from
> > multiple publishers (like the idea (b)), the corresponding process would need
> > to be launched on publisher sides and logical replication would also need to
> > start on each connection. I think it would be better to get the remote wal flush
> > location using the existing logical replication connection (i.e., between the
> > logical wal sender and the apply worker), and advertise the locations on the
> > shared memory. Then, the central process who holds the slot to retain the
> > deleted row versions traverses them and increases slot.xmin if possible.
> >
> > The cost of requesting the remote wal flush location would not be huge if we
> > don't ask it very frequently. So probably we can start by having each apply
> > worker (in the retain_sub_list) ask the remote wal flush location and can leave
> > the optimization of avoiding sending the request for the same publisher.
>
> Agreed. Here is the POC patch set based on this idea.
>
> The implementation is as follows:
>
> A subscription option is added to allow users to specify whether dead
> tuples on the subscriber, which are useful for detecting update_deleted
> conflicts, should be retained. The default setting is false. If set to true,
> the detection of update_deleted will be enabled,
>

I find the option name retain_dead_tuples bit misleading because by
name one can't make out the purpose of the same. It is better to name
it as detect_update_deleted or something on those lines.

> and an additional replication
> slot named pg_conflict_detection will be created on the subscriber to prevent
> dead tuples from being removed. Note that if multiple subscriptions on one node
> enable this option, only one replication slot will be created.
>

In general, we should have done this by default but as detecting
update_deleted type conflict has some overhead in terms of retaining
dead tuples for more time, so having an option seems reasonable. But I
suggest to keep this as a separate last patch. If we can make the core
idea work by default then we can enable it via option in the end.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2024-10-01 12:45:32 Re: pg_verifybackup: TAR format backup verification
Previous Message Greg Sabino Mullane 2024-10-01 12:43:20 Re: Truncate logs by max_log_size