Re: Corruption during WAL replay

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, deniel1495(at)mail(dot)ru, Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>, tejeswarm(at)hotmail(dot)com, hlinnaka <hlinnaka(at)iki(dot)fi>, Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Daniel Wood <hexexpert(at)comcast(dot)net>
Subject: Re: Corruption during WAL replay
Date: 2022-03-25 05:38:45
Message-ID: 3193652.1648186725@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Andres Freund <andres(at)anarazel(dot)de> writes:
> Ah, and that's finally also the explanation why I couldn't reproduce the
> failure it in a different directory, with an otherwise identically configured
> PG: The length of the path to the tablespace influences the size of the
> XLOG_TBLSPC_CREATE record.

Ooooohhh ... yeah, that could explain a lot of cross-animal variation.

> Not sure what to do here... I guess we can just change the value we overwrite
> the page with and hope to not hit this again? But that feels deeply deeply
> unsatisfying.

AFAICS, this strategy of whacking a predetermined chunk of the page with
a predetermined value is going to fail 1-out-of-64K times. We have to
change the test so that it's guaranteed to produce an invalid checksum.
Inverting just the checksum field, without doing anything else, would
do that ... but that feels pretty unsatisfying too.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2022-03-25 06:07:37 Re: Corruption during WAL replay
Previous Message Masahiko Sawada 2022-03-25 05:36:28 Re: Failed transaction statistics to measure the logical replication progress