Re: Incorrect result of bitmap heap scan.

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Konstantin Knizhnik <knizhnik(at)garret(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Incorrect result of bitmap heap scan.
Date: 2024-12-02 16:43:42
Message-ID: 1643257.1733157822@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Peter Geoghegan <pg(at)bowt(dot)ie> writes:
> On Mon, Dec 2, 2024 at 10:15 AM Matthias van de Meent
> <boekewurm+postgres(at)gmail(dot)com> wrote:
>> The running theory is that bitmap executor nodes incorrectly assume
>> that the rows contained in the bitmap all are still present in the
>> index, and thus assume they're allowed to only check the visibility
>> map to see if the reference contained in the bitmap is visible.
>> However, this seems incorrect: Note that index AMs must hold at least
>> pins on the index pages that contain their results when those results
>> are returned by amgettuple() [0], and that amgetbitmap() doesn't do
>> that for all TIDs in the bitmap; thus allowing vacuum to remove TIDs
>> from the index (and later, heap) that are still present in the bitmap
>> used in the scan.

> This theory seems very believable.

I'm not convinced. I think there are two assumptions underlying
bitmap scan:

1. Regardless of index contents, it's not okay for vacuum to remove
tuples that an open transaction could potentially see. So the heap
tuple will be there if we look, unless it was already dead (in which
case it could have been replaced, so we have to check visibility of
whatever we find).

2. If the page's all-visible bit is set, there has been no recent
change in its contents, so we don't have to look at the page.
"Recent" is a bit squishily defined, but again it should surely
cover outdating or removal of a tuple that an open transaction
could see.

If this is not working, I am suspicious that somebody made page
freezing too aggressive somewhere along the line.

Whether that's true or not, it seems like it'd be worth bisecting
to see if we can finger a commit where the behavior changed (and
the same goes for the question of why-isnt-it-an-IOS-scan). However,
the reproducer seems to have quite a low failure probability for me,
which makes it hard to do bisection testing with much confidence.
Can we do anything to make the test more reliable? If I'm right
to suspect autovacuum, maybe triggering lots of manual vacuums
would improve the odds?

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2024-12-02 16:44:17 Re: optimize file transfer in pg_upgrade
Previous Message Nathan Bossart 2024-12-02 16:34:21 Re: revamp row-security tracking