Re: Skipping logical replication transactions on subscriber side

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Greg Nancarrow <gregn4422(at)gmail(dot)com>, "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com>, Alexey Lesovsky <lesovsky(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Skipping logical replication transactions on subscriber side
Date: 2021-12-03 06:41:47
Message-ID: CAD21AoAzxB+7U8QWjHnaHY-oa-SApBLc4VqDjVewQgkem1Prmw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Dec 3, 2021 at 11:53 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Thu, Dec 2, 2021 at 8:38 PM Peter Eisentraut
> <peter(dot)eisentraut(at)enterprisedb(dot)com> wrote:
> >
> > On 02.12.21 07:48, Amit Kapila wrote:
> > > a. ALTER SUBSCRIPTION ... [SET|RESET] SKIP TRANSACTION xxx;
> > > b. Alter Subscription <sub_name> SET ( subscription_parameter [=value]
> > > [, ... ] );
> > > c. Alter Subscription <sub_name> On Error ( subscription_parameter
> > > [=value] [, ... ] );
> > > d. Alter Subscription <sub_name> SKIP ( subscription_parameter
> > > [=value] [, ... ] );
> > > where subscription_parameter can be one of:
> > > xid = <xid_val>
> > > lsn = <lsn_val>
> > > ...
> >
> > > As per discussion till now, option (d) seems preferable.
> >
> > I agree.

+1

> >
> > > In this, we
> > > need to see how and what to allow as options. The simplest way for the
> > > first version is to just allow one xid to be specified at a time which
> > > would mean that specifying multiple xids should error out. We can also
> > > additionally allow specifying operations like 'insert', 'update',
> > > etc., and then relation list (list of oids). What that would mean is
> > > that for a transaction we can allow which particular operations and
> > > relations we want to skip.
> >
> > I don't know how difficult it would be, but allowing multiple xids might
> > be desirable.
> >
>
> Are there many cases where there could be multiple xid failures that
> the user can skip? Apply worker always keeps looping at the same error
> failure so the user wouldn't know of the second xid failure (if any)
> till the first failure is resolved. I could think of one such case
> where it is possible during the initial synchronization phase where
> apply worker went ahead then tablesync worker by skipping to apply the
> changes on the corresponding table. After that, it is possible, that
> the table sync worker failed during the catch-up phase and apply
> worker fails during the processing of some other rel.
>
> > But this syntax gives you flexibility, so we can also
> > start with a simple implementation.
> >
>
> Yeah, I also think so. BTW, what do you think of providing extra
> flexibility of giving other options like 'operation', 'rel' along with
> xid? I think such options could be useful for large transactions that
> operate on multiple tables as it is quite possible that only a
> particular operation from the entire transaction is the cause of
> failure. Now, on one side, we can argue that skipping the entire
> transaction is better from the consistency point of view but I think
> it is already possible that we just skip a particular update/delete
> (if the corresponding tuple doesn't exist on the subscriber). For the
> sake of simplicity, we can just allow providing xid at this stage and
> then extend it later as required but I am not very sure of that point.

+1

Skipping a whole transaction by specifying xid would be a good start.
Ideally, we'd like to automatically skip only operations within the
transaction that fail but it seems not easy to achieve. If we allow
specifying operations and/or relations, probably multiple operations
or relations need to be specified in some cases. Otherwise, the
subscriber cannot continue logical replication if the transaction has
multiple operations on different relations that fail. But similar to
the idea of specifying multiple xids, we need to note the fact that
user wouldn't know of the second operation failure unless the apply
worker applies the change. So I'm not sure there are many use cases in
practice where users can specify multiple operations and relations in
order to skip applies that fail.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Yugo NAGATA 2021-12-03 06:52:22 Re: Commitfest 2021-11 Patch Triage - Part 1
Previous Message Dilip Kumar 2021-12-03 06:26:58 Re: suboverflowed subtransactions concurrency performance optimize