From: | Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> |
---|---|
To: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Cc: | Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> |
Subject: | BRIN minmax multi - incorrect distance for infinite timestamp/date |
Date: | 2023-10-12 22:38:40 |
Message-ID: | eef0ea8c-4aaa-8d0d-027f-58b1f35dd170@enterprisedb.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
Ashutosh Bapat reported me off-list a possible issue in how BRIN
minmax-multi calculate distance for infinite timestamp/date values.
The current code does this:
if (TIMESTAMP_NOT_FINITE(dt1) || TIMESTAMP_NOT_FINITE(dt2))
PG_RETURN_FLOAT8(0);
so means infinite values are "very close" to any other value, and thus
likely to be merged into a summary range. That's exactly the opposite of
what we want to do, possibly resulting in inefficient indexes.
Consider this example
create table test (a timestamptz) with (fillfactor=50);
insert into test
select (now() + ((10000 * random())::int || ' seconds')::interval)
from generate_series(1,1000000) s(i);
update test set a = '-infinity'::timestamptz where random() < 0.01;
update test set a = 'infinity'::timestamptz where random() < 0.01;
explain (analyze, timing off, costs off)
select * from test where a = '2024-01-01'::timestamptz;
QUERY PLAN
------------------------------------------------------------------------------
Bitmap Heap Scan on test (actual rows=0 loops=1)
Recheck Cond: (a = '2024-01-01 00:00:00+01'::timestamp with time zone)
Rows Removed by Index Recheck: 680662
Heap Blocks: lossy=6024
-> Bitmap Index Scan on test_a_idx (actual rows=60240 loops=1)
Index Cond: (a = '2024-01-01 00:00:00+01'::timestamp with time
zone)
Planning Time: 0.075 ms
Execution Time: 106.871 ms
(8 rows)
Clearly, large part of the table gets scanned - this happens because
when building the index, we end up with ranges like this:
[-infinity,a,b,c,...,x,y,z,infinity]
and we conclude that distance for [-infinity,a] is 0, and we combine
these values into a range. And the same for [z,infinity]. But we should
do exactly the opposite thing - never merge those.
Attached is a patch fixing this, with which the plan looks like this:
QUERY PLAN
------------------------------------------------------------------------------
Bitmap Heap Scan on test (actual rows=0 loops=1)
Recheck Cond: (a = '2024-01-01 00:00:00+01'::timestamp with time zone)
-> Bitmap Index Scan on test_a_idx (actual rows=0 loops=1)
Index Cond: (a = '2024-01-01 00:00:00+01'::timestamp with time
zone)
Planning Time: 0.289 ms
Execution Time: 9.432 ms
(6 rows)
Which seems much better.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachment | Content-Type | Size |
---|---|---|
brin-infinity-fix.patch | text/x-patch | 1.4 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Alexander Korotkov | 2023-10-12 22:54:01 | Re: On login trigger: take three |
Previous Message | David G. Johnston | 2023-10-12 22:28:51 | Re: Pro et contra of preserving pg_proc oids during pg_upgrade |