From: | Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com> |
---|---|
To: | Konstantin Knizhnik <knizhnik(at)garret(dot)ru> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Incorrect result of bitmap heap scan. |
Date: | 2024-12-02 15:08:02 |
Message-ID: | CAEze2Wg1Q4gWzm9RZ0yXydm23Mj3iScu8LA__Zz3JJEgpnoGPQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, 2 Dec 2024 at 15:25, Konstantin Knizhnik <knizhnik(at)garret(dot)ru> wrote:
>
> Hi hackers,
>
> Attached script reproduces the problem with incorrect results of `select count(*)` (it returns larger number of records than really available in the table).
> It is not always reproduced, so you may need to repeat it multiple times - at my system it failed 3 times from 10.
>
> The problem takes place with pg16/17/18 (other versions I have not checked).
I suspect all back branches are affected. As I partially also mentioned offline:
The running theory is that bitmap executor nodes incorrectly assume
that the rows contained in the bitmap all are still present in the
index, and thus assume they're allowed to only check the visibility
map to see if the reference contained in the bitmap is visible.
However, this seems incorrect: Note that index AMs must hold at least
pins on the index pages that contain their results when those results
are returned by amgettuple() [0], and that amgetbitmap() doesn't do
that for all TIDs in the bitmap; thus allowing vacuum to remove TIDs
from the index (and later, heap) that are still present in the bitmap
used in the scan.
Concurrency timeline:
Session 1. amgetbitmap() gets snapshot of index contents, containing
references to dead tuples on heap P1.
Session 2. VACUUM runs on the heap, removes TIDs for P1 from the
index, deletes those TIDs from the heap pages, and finally sets heap
pages' VM bits to ALL_VISIBLE, including the now cleaned page P1
Session 1. Executes the bitmap heap scan that uses the bitmap without
checking tuples on P1 due to ALL_VISIBLE and a lack of output columns.
I think this might be an oversight when the feature was originally
committed in 7c70996e (PG11): we don't know when the VM bit was set,
and the bitmap we're scanning may thus be out-of-date (and should've
had TIDs removed it it had been an index), so I propose disabling this
optimization for now, as attached. Patch v1 is against a recent HEAD,
PG17 applies to the 17 branch, and PG16- should work on all (or at
least, most) active backbranches older than PG17's.
Kind regards,
Matthias van de Meent
Neon (https://neon.tech)
PS.
I don't think the optimization itself is completely impossible, and we
can probably re-enable an optimization like that if (or when) we find
a way to reliably keep track of when to stop using the optimization. I
don't think that's an easy fix, though, and definitely not for
backbranches.
The solution I could think to keep most of this optimization requires
the heap bitmap scan to notice that a concurrent process started
removing TIDs from the heap after amgetbitmap was called; i.e.. a
"vacuum generation counter" incremented every time heap starts the
cleanup run. This is quite non-trivial, however, as we don't have much
in place regarding per-relation shared structures which we could put
such a value into, nor a good place to signal changes of the value to
bitmap heap-scanning backends.
Attachment | Content-Type | Size |
---|---|---|
PG17-Disable-BitmapScan-s-skip_fetch-optimization.patch.txt | text/plain | 7.5 KB |
v1-0001-Remove-BitmapScan-s-skip_fetch-optimization.patch | application/x-patch | 8.3 KB |
PG16-_Disable-BitmapScan-s-skip_fetch-optimization.patch.txt | text/plain | 4.8 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Turelinckx | 2024-12-02 15:25:56 | Re: RISC-V animals sporadically produce weird memory-related failures |
Previous Message | Nazir Bilal Yavuz | 2024-12-02 14:55:32 | Re: Using read stream in autoprewarm |