Re: 001_rep_changes.pl fails due to publisher stuck on shutdown

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: smithpb2250(at)gmail(dot)com
Cc: exclusion(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: 001_rep_changes.pl fails due to publisher stuck on shutdown
Date: 2024-06-06 06:19:20
Message-ID: 20240606.151920.427007697352129737.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Thu, 6 Jun 2024 12:49:45 +1000, Peter Smith <smithpb2250(at)gmail(dot)com> wrote in
> Hi, I have reproduced this multiple times now.
>
> I confirmed the initial post/steps from Alexander. i.e. The test
> script provided [1] gets itself into a state where function
> ReadPageInternal (called by XLogDecodeNextRecord and commented "Wait
> for the next page to become available") constantly returns
> XLREAD_FAIL. Ultimately the test times out because WalSndLoop() loops
> forever, since it never calls WalSndDone() to exit the walsender
> process.

Thanks for the repro; I believe I understand what's happening here.

During server shutdown, the latter half of the last continuation
record may fail to be flushed. This is similar to what is described in
the commit message of commit ff9f111bce. While shutting down,
WalSndLoop() waits for XLogSendLogical() to consume WAL up to
flushPtr, but in this case, the last record cannot complete without
the continuation part starting from flushPtr, which is
missing. However, in such cases, xlogreader.missingContrecPtr is set
to the beginning of the missing part, but something similar to

So, I believe the attached small patch fixes the behavior. I haven't
come up with a good test script for this issue. Something like
026_overwrite_contrecord.pl might work, but this situation seems a bit
more complex than what it handles.

Versions back to 10 should suffer from the same issue and the same
patch will be applicable without significant changes.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment Content-Type Size
0001-Fix-infinite-loop-in-walsender-during-publisher-shut.patch text/x-patch 1.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2024-06-06 06:47:52 Re: Logical Replication of sequences
Previous Message Bertrand Drouvot 2024-06-06 05:56:14 Re: Avoid orphaned objects dependencies, take 3