Re: Exceptional md.c paths for recovery and zero_damaged_pages

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers(at)postgresql(dot)org, Noah Misch <noah(at)leadboat(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject: Re: Exceptional md.c paths for recovery and zero_damaged_pages
Date: 2024-12-17 22:07:34
Message-ID: CAH2-WznWHUtTf48JQrznmmZKm1OwxETo7+Eb24w4TiTWAr+4dw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Dec 17, 2024 at 4:46 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> ISTM that we could do better with some fairly simple cooperation between index
> and table AM. It should be rather rare to look up a TID that was removed
> between finding the index entry and fetching the table entry, a small amount
> of extra work in that case ought to be ok.

Maybe, but I just want to be clear: as far as I know, as things stand
we're very permissive about what can happen around concurrent TID
recycling. We need to be because bitmap index scans can build a bitmap
based on the index as it was some time ago (among other reasons),
which cannot prevent concurrent TID recycling for TIDs that point to
dead-to-everybody tuples (or point to LP_DEAD heap page stubs).

> Could we e.g., for logged tables, track the LSN of the leaf index page in the
> IndexScanDesc and pass that to table_index_fetch_tuple() and only error out in
> if the table relation has a newer LSN? That leaves a small window for a
> false-negative, but shouldn't have any false-positives?

Technically that would work, but it might not be very useful.

It's very typical for a heap page LSN to be older than a corresponding
index leaf page LSN, since inserts start with the heap tuple insertion
(the index tuple insertion must happen afterwards). Plus index pages
almost always store more tuples than their corresponding heap pages
and so are presumably more likely to be modified.

> I've seen enough bugs / corruption leading to indexes pointing to wrong and
> nonexisting tuples to make me think it's worth being a bit more proactive
> about raising errors for such cases. Of course what I described above can
> only detect index entries pointing to non-existing tuples, but that's still
> a good bit better than nothing.

OTOH we've had index_delete_check_htid for several years now, and I've
yet to hear a report involving one of its errors. That's not
conclusive, but it does suggest that this might not be a huge problem.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2024-12-17 22:27:11 Re: Exceptional md.c paths for recovery and zero_damaged_pages
Previous Message Tom Lane 2024-12-17 22:04:49 Re: Pg18 Recursive Crash