Re: Corruption during WAL replay

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, deniel1495(at)mail(dot)ru, Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>, tejeswarm(at)hotmail(dot)com, hlinnaka <hlinnaka(at)iki(dot)fi>, Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Daniel Wood <hexexpert(at)comcast(dot)net>
Subject: Re: Corruption during WAL replay
Date: 2022-03-25 05:34:45
Message-ID: 20220325053445.4clfc7of3y5yvesy@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2022-03-25 01:23:00 -0400, Tom Lane wrote:
> Andres Freund <andres(at)anarazel(dot)de> writes:
> > I do see that the LSN that ends up on the page is the same across a few runs
> > of the test on serinus. Which presumably differs between different
> > animals. Surprised that it's this predictable - but I guess the run is short
> > enough that there's no variation due to autovacuum, checkpoints etc.
>
> Uh-huh. I'm not surprised that it's repeatable on a given animal.
> What remains to be explained:
>
> 1. Why'd it start failing now? I'm guessing that ce95c5437 *was* the
> culprit after all, by slightly changing the amount of catalog data
> written during initdb, and thus moving the initial LSN.

Yep, verified that (see mail I just sent).

> 2. Why just these two animals? If initial LSN is the critical thing,
> then the results of "locale -a" would affect it, so platform
> dependence is hardly surprising ... but I'd have thought that all
> the animals on that host would use the same initial set of
> collations.

I think it's the animal's name that makes the difference, due to the
tablespace path lenght thing. And while I was confused for a second by

petalura
pogona
serinus
dragonet

failing, despite different name lengths, it still makes sense: We MAXALIGN the
start of records. Which explains why flaviventris didn't fail the same way.

> As for a fix, would damaging more of the page help? I guess
> it'd just move around the one-in-64K chance of failure.

As I wrote in the other email, I think spreading the changes out wider might
help. But it's still not great. However:

> Maybe we have to intentionally corrupt (e.g. invert) the
> checksum field specifically.

seems like it'd do the trick? Even a single bit change of the checksum ought
to do, as long as we don't set it to 0.

Greetings,

Andres Freund

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2022-03-25 05:36:28 Re: Failed transaction statistics to measure the logical replication progress
Previous Message Andres Freund 2022-03-25 05:26:54 Re: Corruption during WAL replay