Re: Logical replication timeout problem

From: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
To: "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>
Cc: Fabrice Chapuis <fabrice636861(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Euler Taveira <euler(at)eulerto(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com>, Petr Jelinek <petr(dot)jelinek(at)enterprisedb(dot)com>, "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Ajin Cherian <itsajin(at)gmail(dot)com>
Subject: Re: Logical replication timeout problem
Date: 2023-01-16 16:27:50
Message-ID: CAExHW5u93iKb_DPvXG=FcOPOMU7=tCYUbGeha7-qgxZLiu0JhA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jan 9, 2023 at 4:08 PM wangw(dot)fnst(at)fujitsu(dot)com
<wangw(dot)fnst(at)fujitsu(dot)com> wrote:
>
> On Fri, Jan 6, 2023 at 15:06 PM Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> wrote:
>
> I'm not sure if we need to add global variables or member variables for a
> cumulative count that is only used here. How would you feel if I add some
> comments when declaring this static variable?

I see WalSndUpdateProgress::sendTime is static already. So this seems
fine. A comment will help sure.

>
> > +
> > + if (!ctx->update_progress)
> > + return;
> > +
> > + Assert(!ctx->fast_forward);
> > +
> > + /* set output state */
> > + ctx->accept_writes = false;
> > + ctx->write_xid = txn->xid;
> > + ctx->write_location = change->lsn;
> > + ctx->end_xact = false;
> >
> > This patch reverts many of the changes of the previous commit which tried to
> > fix this issue i.e. 55558df2374. end_xact was introduced by the same commit but
> > without much explanation of that in the commit message. Its only user,
> > WalSndUpdateProgress(), is probably making a wrong assumption as well.
> >
> > * We don't have a mechanism to get the ack for any LSN other than end
> > * xact LSN from the downstream. So, we track lag only for end of
> > * transaction LSN.
> >
> > IIUC, WAL sender tracks the LSN of the last WAL record read in sentPtr which is
> > sent downstream through a keep alive message. Downstream may
> > acknowledge this
> > LSN. So we do get ack for any LSN, not just commit LSN.
> >
> > So I propose removing end_xact as well.
>
> We didn't track the lag during a transaction because it could make the
> calculations of lag functionality inaccurate. If we track every lsn, it could
> fail to record important lsn information because of
> WALSND_LOGICAL_LAG_TRACK_INTERVAL_MS (see function WalSndUpdateProgress).
> Please see details in [1] and [2].

LagTrackerRead() interpolates to reduce the inaccuracy. I don't
understand why we need to track the end LSN only. But I don't think
that affects this fix. So I am fine if we want to leave end_xact
there.

--
Best Wishes,
Ashutosh Bapat

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2023-01-16 16:28:52 Re: Rethinking the implementation of ts_headline()
Previous Message Robert Haas 2023-01-16 16:25:03 Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation