Re: Skipping logical replication transactions on subscriber side

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Greg Nancarrow <gregn4422(at)gmail(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com>, Alexey Lesovsky <lesovsky(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Skipping logical replication transactions on subscriber side
Date: 2022-01-26 02:51:40
Message-ID: CAD21AoCU2PLm+SxdOdUMpjgHioMB6baOoxqNi_XHD6QQJF+RKg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 26, 2022 at 11:28 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Tue, Jan 25, 2022 at 8:39 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Tue, Jan 25, 2022 at 11:58 PM David G. Johnston
> > <david(dot)g(dot)johnston(at)gmail(dot)com> wrote:
> > >
> > > On Tue, Jan 25, 2022 at 7:47 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > >>
> > >> Yeah, I think it's a good idea to clear the subskipxid after the first
> > >> transaction regardless of whether the worker skipped it.
> > >>
> > >
> > > So basically instead of stopping the worker with an error you suggest having the worker continue applying changes (after resetting subskipxid, and - arguably - the ?_error_* fields). Log the transaction xid mis-match as a warning in the log file as opposed to an error.
> >
> > Agreed, I think it's better to log a warning than to raise an error.
> > In the case where the user specified the wrong XID, the worker should
> > fail again due to the same error.
> >
>
> IIUC, the proposal is to compare the skip_xid with the very
> transaction the apply worker received to apply and raise a warning if
> it doesn't match with skip_xid and then continue. This seems like a
> reasonable idea but can we guarantee that it is always the first
> transaction that we want to skip? We seem to guarantee that we won't
> get something again once it is written durably/flushed on the
> subscriber side. I guess here it can happen that before the errored
> transaction, there is some empty xact, or maybe part of the stream
> (consider streaming transactions) of some xact, or there could be
> other cases as well where the server will send those xacts again.

Good point.

I guess that in the situation the worker entered an error loop, we can
guarantee that the worker fails while applying the first non-empty
transaction since starting logical replication. And the transaction is
what we’d like to skip. If the transaction that can be applied without
an error is resent after a restart, it’s a problem of logical
replication. As you pointed out, it's possible that there are some
empty transactions before the transaction in question since we don't
advance replication origin LSN if the transaction is empty. Also,
probably the same is true for a streamed transaction that is rolled
back or ROLLBACK-PREPARED transactions. So, we can also skip clearing
subskipxid if the transaction is empty? That is, we make sure to clear
it after applying the first non-empty transaction. We would need to
carefully think about this solution otherwise ALTER SUBSCRIPTION SKIP
ends up not working at all in some cases.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message houzj.fnst@fujitsu.com 2022-01-26 03:07:18 RE: row filtering for logical replication
Previous Message Bharath Rupireddy 2022-01-26 02:39:16 Re: Is it correct to update db state in control file as "shutting down" during end-of-recovery checkpoint?