From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Alexander Lakhin <exclusion(at)gmail(dot)com> |
Cc: | Melanie Plageman <melanieplageman(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Tomas Vondra <tomas(at)vondra(dot)me>, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: BitmapHeapScan streaming read user and prelim refactoring |
Date: | 2025-03-22 21:04:55 |
Message-ID: | gvwc7tvw2jxbxpge3x6aqwi7xdgbk5kckwvhwr3m3nuyjxaxeu@4hgqbpzwpmxj |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2025-03-22 16:42:42 -0400, Andres Freund wrote:
> Hm, it's clearly related to the all-visible path, but I think the bug is
> actually independent of what was reported there. The bug you reported is
> perfectly reproducible, without any concurrency, after all.
>
> Here's a smaller reproducer:
>
> CREATE TABLE bmscantest (a int, t text);
>
> INSERT INTO bmscantest
> SELECT (r%53), 'foooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo'
> FROM generate_series(1,70000) r;
>
> CREATE INDEX i_bmtest_a ON bmscantest(a);
> set enable_indexscan=false;
> set enable_seqscan=false;
>
> -- Lower work_mem to trigger use of lossy bitmaps
> set work_mem = 64;
>
> SELECT count(*) FROM bmscantest WHERE a = 1;
> vacuum freeze bmscantest;
> SELECT count(*) FROM bmscantest WHERE a = 1;
>
> -- clean up
> DROP TABLE bmscantest;
>
>
> The first SELECT reports 1321, the second 572 tuples.
The problem is that sometimes recheck is performed for pending empty
tuples. The comment about empty tuples says:
/*
* Bitmap is exhausted. Time to emit empty tuples if relevant. We emit
* all empty tuples at the end instead of emitting them per block we
* skip fetching. This is necessary because the streaming read API
* will only return TBMIterateResults for blocks actually fetched.
* When we skip fetching a block, we keep track of how many empty
* tuples to emit at the end of the BitmapHeapScan. We do not recheck
* all NULL tuples.
*/
*recheck = false;
return bscan->rs_empty_tuples_pending > 0;
But we actually emit empty tuples at each page boundary:
/*
* Out of range? If so, nothing more to look at on this page
*/
while (hscan->rs_cindex >= hscan->rs_ntuples)
{
/*
* Emit empty tuples before advancing to the next block
*/
if (bscan->rs_empty_tuples_pending > 0)
{
/*
* If we don't have to fetch the tuple, just return nulls.
*/
ExecStoreAllNullTuple(slot);
bscan->rs_empty_tuples_pending--;
return true;
}
/*
* Returns false if the bitmap is exhausted and there are no further
* blocks we need to scan.
*/
if (!BitmapHeapScanNextBlock(scan, recheck, lossy_pages, exact_pages))
return false;
}
So we don't just process tuples at the end of the scan, but at each page
boundary.
This leads to wrong results whenever there is a rs_empty_tuples_pending > 0
after a previous page that needed rechecks, because nothing will reset
*recheck = false.
And indeed, if I add a *recheck = false in that inside the
/*
* Emit empty tuples before advancing to the next block
*/
if (bscan->rs_empty_tuples_pending > 0)
{
the problem goes away.
A fix should do slightly more, given that the "at the end" comment is wrong.
But I'm wondering if it's worth fixing it, or if we should just rip the logic
out alltogether, due to the whole VM checking being unsound as we learned in
the thread I referenced earlie, without even bothering to fix the bug here.
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Andrew Jackson | 2025-03-22 21:10:28 | Update LDAP Protocol in fe-connect.c to v3 |
Previous Message | Sami Imseih | 2025-03-22 20:59:02 | Re: pg_stat_statements and "IN" conditions |