From: | Konstantin Knizhnik <knizhnik(at)garret(dot)ru> |
---|---|
To: | PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Incorrect result of bitmap heap scan. |
Date: | 2024-12-02 14:25:33 |
Message-ID: | 873c33c5-ef9e-41f6-80b2-2f5e11869f1c@garret.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi hackers,
Attached script reproduces the problem with incorrect results of `select
count(*)` (it returns larger number of records than really available in
the table).
It is not always reproduced, so you may need to repeat it multiple times
- at my system it failed 3 times from 10.
The problem takes place with pg16/17/18 (other versions I have not checked).
The test is called `test_ios` (index-only-scan), but it is not correct.
Index-only scan is not used in this case.
And this is actually the first question to PG17/18: IOS is not used when
number of records is less than 100k (for this particular table):
postgres=# create table t(pk integer primary key); CREATE TABLE
postgres=# set enable_seqscan = off; SET postgres=# set enable_indexscan
= off; SET postgres=# insert into t values (generate_series(1,1000));
INSERT 0 1000 postgres=# vacuum t; VACUUM postgres=# explain select
count(*) from t; QUERY PLAN
-------------------------------------------------------------------------------
Aggregate (cost=43.02..43.03 rows=1 width=8) -> Bitmap Heap Scan on t
(cost=25.52..40.52 rows=1000 width=0) -> Bitmap Index Scan on t_pkey
(cost=0.00..25.27 rows=1000 width=0) (3 rows) postgres=# set
enable_bitmapscan = off; SET postgres=# explain select count(*) from t;
QUERY PLAN ------------------------------------------------------------
Aggregate (cost=17.50..17.51 rows=1 width=8) -> Seq Scan on t
(cost=0.00..15.00 rows=1000 width=0) Disabled: true (3 rows)
So, as you can see, Postgres prefers to use disabled seqscan, but not
IOS. It is different from pg16 where disabling bitmap scan makes
optimizer to choose index-only scan:
postgres=# explain select count(*) from t; QUERY PLAN
----------------------------------------------------------- Aggregate
(cost=41.88..41.88 rows=1 width=8) -> Seq Scan on t (cost=0.00..35.50
rows=2550 width=0) (2 rows) postgres=# set enable_seqscan = off; SET
postgres=# explain select count(*) from t; QUERY PLAN
-------------------------------------------------------------------------------
Aggregate (cost=75.54..75.55 rows=1 width=8) -> Bitmap Heap Scan on t
(cost=33.67..69.17 rows=2550 width=0) -> Bitmap Index Scan on t_pkey
(cost=0.00..33.03 rows=2550 width=0) (3 rows) postgres=# set
enable_bitmapscan = off; SET postgres=# explain select count(*) from t;
QUERY PLAN
-------------------------------------------------------------------------------
Aggregate (cost=45.77..45.78 rows=1 width=8) -> Index Only Scan using
t_pkey on t (cost=0.28..43.27 rows=1000 width=0) (2 rows)
This is strange behavior of pg17 which for some reasons rejects IOS (but
it is used if number of records in the table is 100k or more). But the
main problem is that used plan Bitmap Heap Scan + Bitmap Index Scan may
return incorrect result.
Replacing `select count(*)` with `select count(pk)` eliminates the
problem, as well as disabling of autovacuum. It seems to be clear that
the problem is with visibility map.
We have the following code in heap bitmap scan: /* * We can skip
fetching the heap page if we don't need any fields from the * heap, the
bitmap entries don't need rechecking, and all tuples on the * page are
visible to our transaction. */ if (!(scan->rs_flags & SO_NEED_TUPLES) &&
!tbmres->recheck && VM_ALL_VISIBLE(scan->rs_rd, tbmres->blockno,
&hscan->rs_vmbuffer)) { /* can't be lossy in the skip_fetch case */
Assert(tbmres->ntuples >= 0); Assert(hscan->rs_empty_tuples_pending >=
0); hscan->rs_empty_tuples_pending += tbmres->ntuples; return true; }
So if we do not need tuples (|count(*)|case) and page is marked as
all-visible in VM, then we just count|tbmres->ntuples|elements without
extra checks.
I almost not so familiar with internals of executor, but it is not clear
to me how we avoid race condition between VM update and heap bitmap scan?
Assume that bitmap scan index marks all tids available in index. Some
elements in this bitmap can refer old (invisible) versions. Then vacuum
comes, removes dead elements and mark page as all-visible. After it we
start heap bitmap scan, see that page is all-visible and count all
marked elements on this page including dead (which are not present in
the page any more).
Which lock or check should prevent such scenario?
Attachment | Content-Type | Size |
---|---|---|
test_ios.py | text/x-python-script | 2.4 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2024-12-02 14:28:00 | Re: Count and log pages set all-frozen by vacuum |
Previous Message | Robert Haas | 2024-12-02 14:14:11 | Re: Consider pipeline implicit transaction as a transaction block |