Re: Exceptional md.c paths for recovery and zero_damaged_pages

From: Andres Freund <andres(at)anarazel(dot)de>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: pgsql-hackers(at)postgresql(dot)org, Noah Misch <noah(at)leadboat(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject: Re: Exceptional md.c paths for recovery and zero_damaged_pages
Date: 2024-12-17 21:28:52
Message-ID: 7lvoqfv4zqxlri6fw3hd4byjh2vzhzfmbjb5lehocv4gsl4fx6@k7eb65wij6tn
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2024-12-17 19:57:13 +0200, Heikki Linnakangas wrote:
> On 14/12/2024 01:44, Andres Freund wrote:
> > The zero_damaged_pages path in bufmgr.c makes sense to me, but this one seems
> > less sane to me. If you want to recover from a data corruption event and
> > can't dump the data because a seqscan stumbles over an invalid page -
> > zero_damaged_pages makes sense.
> >
> > Seqscans or tidscans won't reach the mdreadv() path, because they check the
> > relation size first. Which leaves access from indexes - e.g. an index pointer
> > beyond the end of the heap. But in that case it's not sane to use
> > zero_damaged_pages, because that's almost a guarantee for worsening corruption
> > in the future, because the now empty heap page will eventually be filled with
> > new tuples, which now will be pointed to by index entries pointing that were
> > created before the zeroing.
>
> Well, if you need to do zero_damage_pages=off, you're screwed already, so I
> don't know think the worsening corruption argument matters much.

Well, it matters in the sense of it being important to keep seqscans somewhat
working, as that's required for extracting as much data as possible with
pg_dump. But I don't think there's an equivalent need to keep seqscans
working, given that the only valid action is to reindex anyway.

> And you have the same problem by pages zeroed by a seqscan too.

I don't think so? For seqscans we should *never* hit the "zero a page beyond
EOF" path, because the heapscan will check the relation size at the start of
the scan. You definitely can hit the case of zeroing a heap page, but that
page will still correspond to an on-disk page.

I'd like to get rid of the "zero a page beyond EOF" code, without touching the
"zeroing out page" path that's currently in WaitReadBuffers().

As probably obvious, the reason for this is that I have to rejigger the code
for AIO, and I don't want to write and test code that never makes sense to
reach.

> > I'm wondering if we should just put an error into the relevant paths in HEAD
> > and see whether it triggers for anybody in the next months. Having all these
> > untested paths in md.c forever doesn't seem great.
>
> +1

Cool. Will write something up.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2024-12-17 21:32:06 Re: Crash: invalid DSA memory alloc request
Previous Message Peter Geoghegan 2024-12-17 21:11:53 Re: Exceptional md.c paths for recovery and zero_damaged_pages