Re: 001_rep_changes.pl fails due to publisher stuck on shutdown

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: amit(dot)kapila16(at)gmail(dot)com
Cc: smithpb2250(at)gmail(dot)com, exclusion(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: 001_rep_changes.pl fails due to publisher stuck on shutdown
Date: 2024-06-12 01:13:27
Message-ID: 20240612.101327.1997110414413074864.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Tue, 11 Jun 2024 14:27:28 +0530, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote in
> On Tue, Jun 11, 2024 at 12:34 PM Kyotaro Horiguchi
> <horikyota(dot)ntt(at)gmail(dot)com> wrote:
> >
> > At Tue, 11 Jun 2024 11:32:12 +0530, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote in
> > > Sorry, it is not clear to me why we failed to flush the last
> > > continuation record in logical walsender? I see that we try to flush
> > > the WAL after receiving got_STOPPING in WalSndWaitForWal(), why is
> > > that not sufficient?
> >
> > It seems that, it uses XLogBackgroundFlush(), which does not guarantee
> > flushing WAL until the end.
> >
>
> What would it take to ensure the same? I am trying to explore this
> path because currently logical WALSender sends any outstanding logs up
> to the shutdown checkpoint record (i.e., the latest record) and waits
> for them to be replicated to the standby before exit. Please take a
> look at the comments where we call WalSndDone(). The fix you are
> proposing will break that guarantee.

Shutdown checkpoint is performed after the walsender completed
termination since 086221cf6b, aiming to prevent walsenders from
generating competing WAL (by, for example, CREATE_REPLICATION_SLOT)
records with the shutdown checkpoint. Thus, it seems that the
walsender cannot see the shutdown record, and a certain amount of
bytes before it, as the walsender appears to have relied on the
checkpoint flushing its record, rather than on XLogBackgroundFlush().

If we approve of the walsender being terminated before the shutdown
checkpoint, we need to "fix" the comment, then provide a function to
ensure the synchronization of WAL records.

I'll consider this direction for a while.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Erica Zhang 2024-06-12 02:25:57 Re:Re: Re: Add support to TLS 1.3 cipher suites and curves lists
Previous Message Joseph Koshakow 2024-06-12 01:10:44 Re: Remove dependence on integer wrapping