Re: Exceptional md.c paths for recovery and zero_damaged_pages

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers(at)postgresql(dot)org, Noah Misch <noah(at)leadboat(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject: Re: Exceptional md.c paths for recovery and zero_damaged_pages
Date: 2024-12-17 23:24:48
Message-ID: c20dd012-7a4b-482b-972f-3ffa57da941f@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 17/12/2024 23:28, Andres Freund wrote:
> On 2024-12-17 19:57:13 +0200, Heikki Linnakangas wrote:
>> On 14/12/2024 01:44, Andres Freund wrote:
>>> The zero_damaged_pages path in bufmgr.c makes sense to me, but this one seems
>>> less sane to me. If you want to recover from a data corruption event and
>>> can't dump the data because a seqscan stumbles over an invalid page -
>>> zero_damaged_pages makes sense.
>>>
>>> Seqscans or tidscans won't reach the mdreadv() path, because they check the
>>> relation size first. Which leaves access from indexes - e.g. an index pointer
>>> beyond the end of the heap. But in that case it's not sane to use
>>> zero_damaged_pages, because that's almost a guarantee for worsening corruption
>>> in the future, because the now empty heap page will eventually be filled with
>>> new tuples, which now will be pointed to by index entries pointing that were
>>> created before the zeroing.
>>
>> Well, if you need to do zero_damage_pages=off, you're screwed already, so I
>> don't know think the worsening corruption argument matters much.
>
> Well, it matters in the sense of it being important to keep seqscans somewhat
> working, as that's required for extracting as much data as possible with
> pg_dump. But I don't think there's an equivalent need to keep seqscans
> working, given that the only valid action is to reindex anyway.
>
>
>> And you have the same problem by pages zeroed by a seqscan too.
>
> I don't think so? For seqscans we should *never* hit the "zero a page beyond
> EOF" path, because the heapscan will check the relation size at the start of
> the scan. You definitely can hit the case of zeroing a heap page, but that
> page will still correspond to an on-disk page.

I meant that this scenario:

0. Heap block 123 has some live tuples on it, but it is corrupt.
1. set zero_damaged_pages=on
2. Perform seqscan. It zeroes block 123
3. set zero_damaged_pages=off
4. Insert new tuples. They get inserted to block 123.

Any index entries for the original heap tuples on block 123 that got
zeroed out will now incorrectly point to the new tuples you inserted. No
reads beyond EOF involved.

--
Heikki Linnakangas
Neon (https://neon.tech)

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michail Nikolaev 2024-12-17 23:29:13 Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements
Previous Message David Rowley 2024-12-17 23:11:44 Re: Pg18 Recursive Crash