Re: walreceiver fails on asynchronous replica [SEC=UNOFFICIAL]

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: Mark(dot)Schloss(at)austrac(dot)gov(dot)au
Cc: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: walreceiver fails on asynchronous replica [SEC=UNOFFICIAL]
Date: 2024-02-26 08:26:51
Message-ID: 20240226.172651.1818858959505860537.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

At Fri, 23 Feb 2024 04:04:03 +0000, Mark Schloss <Mark(dot)Schloss(at)austrac(dot)gov(dot)au> wrote in
> <2024-02-23 07:50:05.637 AEDT [1957121]: [1-1] user=,db= > LOG: started streaming WAL from primary at 6/B0000000 on timeline 5
> <2024-02-23 07:50:05.696 AEDT [1957117]: [6-1] user=,db= > LOG: invalid magic number 0000 in log segment 0000000500000006000000B0, offset 0

This appears to suggest that the WAL file that the standby fetched was
zero-filled on the primary side, which cannot happen by a normal
operation. A preallocated WAL segment can be zero-filled but it cannot
be replicated under normal operations.

> <2024-02-22 14:20:23.383 AEDT [565231]: [6-1] user=,db= > FATAL: terminating walreceiver process due to administrator command

This may suggest a config reload with some parameter changes.

One possible scenario matching the log lines could be that someone
switched primary_conninfo to a newly-restored primary. However, if the
new primary had older data than the previously connected primary,
possibly leading to the situation where the segment 0..5..6..B0 on it
was a preallocated one that was filled with zeros, the standby could
end up fetching the zero-filled WAL segment (file) and might fail this
way. If this is the case, such operations should be avoided.

Unfortunately, due to the lack of a reproducer or detailed operations
that took place there, the best I can do now is to guess a possible
scenario as described above. I'm not sure how come the situation
actually arose.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Emiel Mols 2024-02-26 08:46:11 Re: Fastest way to clone schema ~1000x
Previous Message Pavel Stehule 2024-02-26 07:14:20 Re: Fastest way to clone schema ~1000x