Re: BUG #17928: Standby fails to decode WAL on termination of primary

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Alexander Lakhin <exclusion(at)gmail(dot)com>, Sergei Kornilov <sk(at)zsrv(dot)org>, Noah Misch <noah(at)leadboat(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, pgbf(at)twiska(dot)com
Subject: Re: BUG #17928: Standby fails to decode WAL on termination of primary
Date: 2023-09-24 23:58:45
Message-ID: ZRDNNf8Etlvuo48a@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Mon, Sep 25, 2023 at 09:02:35AM +1300, Thomas Munro wrote:
> I see there was a failure on 16 on the very slow AIX box, and I have
> access so looking into that...

Lucky you, if I may say ;)

A bunch of architectures that are not Intel are failing. Here is a
summary based on the buildfarm reports:
topminnow, mips64el with gcc 4.9.2
mereswine, ARMv7 with gcc 10.2.1
sungazer, ppc64 with gcc 8.3.0
frogfish, mips64el with gcc 4.6.3
mamba, macppc with gcc 10.4.0
gull, ARMv7 with clang 13.0.0
grison, ARMv7 with gcc 4.6.3
copperhead, riscv64 with gcc 10.X

The only thing close to that I have close by is tanager on Armv7 (it
has not reported to the buildfarm for a few weeks as it has
overheated because of the summer here, but I've put it back online
now). However, it has passed a few hundred cycles with both gcc and
clang yesterday, on top of having a clean buildfarm run.

With sungazer now failing on REL_16_STABLE, it feels to me that we are
actually looking at two bugs? One on HEAD, and one in stable
branches? For HEAD and the 2PC failure, the records up to PREPARE
TRANSACTION should be replayed by the standby getting promoted, but
I'd rather dig into that with a host that's able to report the
failure.

copperhead seems to be one of the failing hosts that's able to compile
things quickly. Tom, Noah, or copperhead's owner, could it be
possible to get access to one of the hosts that are failing for more
investigation? I would not do more than compiling the code and check
after the state of the 2PC test for this promotion failure.
--
Michael

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Thomas Munro 2023-09-25 00:18:56 Re: BUG #17928: Standby fails to decode WAL on termination of primary
Previous Message Thomas Munro 2023-09-24 20:43:23 Re: BUG #18132: llvm-jit does not build with LLVM 17