Re: Exceptional md.c paths for recovery and zero_damaged_pages

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers(at)postgresql(dot)org, Noah Misch <noah(at)leadboat(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject: Re: Exceptional md.c paths for recovery and zero_damaged_pages
Date: 2024-12-17 22:37:52
Message-ID: CAH2-WzmURNkm9xYw8V4vCbtw7nibCR1bkT=xmuJOJoanjpxkNA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Dec 17, 2024 at 5:27 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> Bitmap heapscans go through a distinct path, so it'd be trivial to accept tids
> pointing into the void for bitmap scans but not index [only] scans.

I wasn't clear. Plain nbtree index scans can drop their leaf page
buffer pin, which makes them prone to following TIDs subject to
concurrent TID recycling. That's the case that any new sanity check in
this area will need to not break. It's fairly similar to what we need
to do with bitmap index scans, but *does* use this path.

I actually think that we might well need to extend this "drop pin to
avoid blocking VACUUM" behavior to GiST and SP-GiST before long (to
avoid regressions when we fix the index-only scan bug affecting GIST +
SP-GiST). I think that it (i.e. the _bt_drop_lock_and_maybe_pin stuff)
needs to be generalized, since it's a fairly index-AM-independent
thing.

> I think the case where we'd need to *not* error out would be if the heap page
> LSN is *newer* than the index leaf page LSN, as that could be due to
> lazy_vacuum_heap() marking items unused between getting the tid from the index
> scan and the heap access.

Ah, okay.

> For it to be legal to remove items from the heap page between the scan of the
> leaf page and the heap fetch, the btree page would need to have been processed
> by btvacuumpage() (with a cleanup lock) and the heap page by
> lazy_vacuum_heap_page().

Just to be clear, the cleanup lock isn't only protective for scans
that don't drop their pin. This is why we need to worry about plain
nbtree index scans that use an MVCC snapshot. (I think that you know
this already, but just want to be clear.)

--
Peter Geoghegan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Melanie Plageman 2024-12-17 22:50:31 Re: Eagerly scan all-visible pages to amortize aggressive vacuum
Previous Message Andres Freund 2024-12-17 22:27:11 Re: Exceptional md.c paths for recovery and zero_damaged_pages