Quick Links

Re: Incorrect result of bitmap heap scan.

From:	Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	Andrew Dunstan <andrew(at)dunslane(dot)net>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Konstantin Knizhnik <knizhnik(at)garret(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Alexander Kuzmenkov <akuzmenkov(at)timescale(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject:	Re: Incorrect result of bitmap heap scan.
Date:	2024-12-04 11:56:08
Message-ID:	CAEze2WiHrnpXU33p7fcbrUyRByfK9jStVHqjxJNTzAhObyCooA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Mon, 2 Dec 2024 at 17:31, Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> Hi,
>
> On 2024-12-02 16:08:02 +0100, Matthias van de Meent wrote:
> > Concurrency timeline:
> >
> > Session 1. amgetbitmap() gets snapshot of index contents, containing
> > references to dead tuples on heap P1.
>
> IIUC, an unstanted important aspect here is that these dead tuples are *not*
> visible to S1. Otherwise the VACUUM in the next step could not remove the dead
> tuples.

Correct, this is about TIDs that are dead to all transactions, so
which might already be LP_DEAD in the heap.

> > PS.
> > I don't think the optimization itself is completely impossible, and we
> > can probably re-enable an optimization like that if (or when) we find
> > a way to reliably keep track of when to stop using the optimization. I
> > don't think that's an easy fix, though, and definitely not for
> > backbranches.
>
> One way to make the optimization safe could be to move it into the indexam. If
> an indexam would check the VM bit while blocking removal of the index entries,
> it should make it safe. Of course that has the disadvantage of needing more VM
> lookups than before, because it would not yet have been deduplicated...

I'm -0.5 on adding visibility checking to index AM's contract. The
most basic contract that an index AM must implement is currently:

1. Store the index tuples provided by aminsert()
2.a With amgettuple, return at least all stored TIDs that might match
the scankey (optionally marked with recheck when the AM isn't sure
about the TID matching the scankeys, or when returning TIDs not
provided by aminsert()), or
2.b With amgetbitmap, return a bitmap containing at least the TIDs
that match the description of 2.a (i.e. allows an index to have an
optimized approach to adding outputs of 2.a into a bitmap)
3. Remove all the bulkdelete-provided tids from its internal structures

Note that visibility checking is absent. Any visibility or dead tuple
hints (through e.g. kill_prior_tuple, or calling
table_index_delete_tuples) are optimizations which are not required
for basic index AM operations, and indeed they are frequently not
implemented in index AMs. Adding a requirement for index AMs to do
visibility checks would IMO significantly change the current
API/layering contracts.

> Another issue with bitmap index scans is that it currently can't use
> killtuples. I've seen several production outages where performance would
> degrade horribly over time whenever estimates lead to important queries to
> switch to bitmap scans, because lots of dead tuples would get accessed over
> and over.
>
> It seems pretty much impossible to fix that with the current interaction
> between nodeBitmap* and indexam. I wonder if we could, at least in some cases,
> and likely with some heuristics / limits, address this by performing some
> visibility checks during the bitmap build.

It's an interesting approach worth exploring. I'm a bit concerned
about the IO patterns this would create, though, especially when this
relates to BitmapAnd nodes: This node would create VM check IOs on the
order of O(sum_matching_tuple_pages), instead of
O(intersect_matching_tuple_pages). Additionally, it's wasted IO if
we're planning to go to the heap anyway, so this VM precheck would
have to be conditional on the bitmap scan not wanting any table data
(e.g. row_number, count()).

> I'm bringing it up here because I
> suspect that some of the necessary changes would overlap with what's needed to
> repair bitmap index-only scans.

I'd call this "bitmap-only scans" instead, as there might be multiple
indexes involved, but indeed, this also does sound like a viable
approach.

> > The solution I could think to keep most of this optimization requires
> > the heap bitmap scan to notice that a concurrent process started
> > removing TIDs from the heap after amgetbitmap was called; i.e.. a
> > "vacuum generation counter" incremented every time heap starts the
> > cleanup run.
>
> I'm not sure this is a good path. We can already clean up pages during index
> accesses and we are working on being able to set all-visible during "hot
> pruning"/page-level-vacuuming. That'd probably actually be safe (because we
> couldn't remove dead items without a real vacuum), but it's starting to get
> pretty complicated...

I imagine this solution to happen in the executor/heapAM bitmapscan
code, not in the index AM's amgetbitmap. It'd note down the 'vacuum
generation counter' before extracting the index's bitmap, and then,
after every VM lookup during the bitmap heap scan, it validates that
the generation counter hasn't changed (and thus that we can use that
VM bit as authorative for the visibility of the TIDs we got). I don't
think that this interaction specifically is very complicated, but it
would indeed add to the overall complexity of the system.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

In response to

Re: Incorrect result of bitmap heap scan. at 2024-12-02 16:31:48 from Andres Freund

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Yugo Nagata	2024-12-04 12:08:26	Re: Doc: clarify the log message level of the VERBOSE option
Previous Message	Nazir Bilal Yavuz	2024-12-04 11:49:11	Re: Make pg_stat_io view count IOs as bytes instead of blocks