Quick Links

Bloom filters and the planner / parallel execution

From:	"Daniel Westermann (DWE)" <daniel(dot)westermann(at)dbi-services(dot)com>
To:	"pgsql-performance(at)lists(dot)postgresql(dot)org" <pgsql-performance(at)lists(dot)postgresql(dot)org>
Subject:	Bloom filters and the planner / parallel execution
Date:	2024-10-08 11:56:16
Message-ID:	GV0P278MB04197233E85D50BB67300DECD27E2@GV0P278MB0419.CHEP278.PROD.OUTLOOK.COM
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

Hi,

previous discussion at [1].

Initially I wanted to send this to the docs mailing list, but while testing this further I think this is more an issue with the planner.

The example(s) in the docs for bloom filters (https://www.postgresql.org/docs/current/bloom.html) cannot be reproduced at all (and the size of the indexes are far off the truth as well).

postgres=# select version();
version
----------------------------------------------------------------------------------------------
PostgreSQL 18devel on x86_64-pc-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit
(1 row)

postgres=# CREATE TABLE tbloom AS
SELECT
(random() * 1000000)::int as i1,
(random() * 1000000)::int as i2,
(random() * 1000000)::int as i3,
(random() * 1000000)::int as i4,
(random() * 1000000)::int as i5,
(random() * 1000000)::int as i6
FROM
generate_series(1,10000000);
SELECT 10000000
postgres=# CREATE INDEX bloomidx ON tbloom USING bloom (i1, i2, i3, i4, i5, i6);
CREATE INDEX
postgres=# analyze tbloom ;
ANALYZE
postgres=# SELECT pg_size_pretty(pg_relation_size('bloomidx'));
pg_size_pretty
----------------
153 MB
(1 row)

The following statement (from the docs as well) will never use the bloom index but will go for parallel execution:

postgres=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
Gather (cost=1000.00..127244.10 rows=1 width=24) (actual time=155.618..155.652 rows=0 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on tbloom (cost=0.00..126244.00 rows=1 width=24) (actual time=145.353..145.354 rows=0 loops=3)
Filter: ((i2 = 898732) AND (i5 = 123451))
Rows Removed by Filter: 3333333
Planning Time: 0.241 ms
Execution Time: 155.673 ms
(8 rows)

The index will be used if we make parallel execution a lot more costly:

postgres=# set parallel_setup_cost = 100000;
SET
postgres=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on tbloom (cost=178436.00..178440.02 rows=1 width=24) (actual time=31.364..31.365 rows=0 loops=1)
Recheck Cond: ((i2 = 898732) AND (i5 = 123451))
Rows Removed by Index Recheck: 2424
Heap Blocks: exact=2372
-> Bitmap Index Scan on bloomidx (cost=0.00..178436.00 rows=1 width=0) (actual time=26.387..26.387 rows=2424 loops=1)
Index Cond: ((i2 = 898732) AND (i5 = 123451))
Planning Time: 0.210 ms
Execution Time: 31.511 ms
(8 rows)

postgres=# reset parallel_setup_cost ;
RESET
postgres=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
Gather (cost=1000.00..127244.10 rows=1 width=24) (actual time=150.802..150.835 rows=0 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on tbloom (cost=0.00..126244.00 rows=1 width=24) (actual time=143.902..143.903 rows=0 loops=3)
Filter: ((i2 = 898732) AND (i5 = 123451))
Rows Removed by Filter: 3333333
Planning Time: 0.172 ms
Execution Time: 150.865 ms
(8 rows)

The reason I am sending this to this list is the execution time. The plan using the index is faster than the plan using parallel execution, but it is not the one executed in the default configuration. Am I missing something?

Here is another example where the index should clearly win:

postgres=# create table tdummy as select i as a, i as b, i as c, i as d, i as e, i as f, i as g, i as h from generate_series (1, 20000000) i;
SELECT 20000000
postgres=# CREATE INDEX bloomidx2 ON tdummy USING bloom (a,b,c,d,e,f,g,h);
CREATE INDEX
postgres=# analyze tdummy ;
ANALYZE
postgres=# explain analyze select * from tdummy where a = 1 and b = 1 and c = 1 and d = 1 and e = 1;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
Gather (cost=1000.00..335576.41 rows=1 width=32) (actual time=0.573..437.776 rows=1 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on tdummy (cost=0.00..334576.31 rows=1 width=32) (actual time=285.408..430.800 rows=0 loops=3)
Filter: ((a = 1) AND (b = 1) AND (c = 1) AND (d = 1) AND (e = 1))
Rows Removed by Filter: 6666666
Planning Time: 0.552 ms
Execution Time: 437.817 ms
(8 rows)

postgres=# set parallel_setup_cost = 10000;
SET
postgres=# explain analyze select * from tdummy where a = 1 and b = 1 and c = 1 and d = 1 and e = 1;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
Gather (cost=10000.00..344576.41 rows=1 width=32) (actual time=0.439..316.780 rows=1 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on tdummy (cost=0.00..334576.31 rows=1 width=32) (actual time=206.356..311.511 rows=0 loops=3)
Filter: ((a = 1) AND (b = 1) AND (c = 1) AND (d = 1) AND (e = 1))
Rows Removed by Filter: 6666666
Planning Time: 0.197 ms
Execution Time: 316.807 ms
(8 rows)

postgres=# explain analyze select * from tdummy where a = 1 and b = 1 and c = 1 and d = 1 and e = 1;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
Gather (cost=10000.00..344572.10 rows=1 width=32) (actual time=1.284..291.015 rows=1 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on tdummy (cost=0.00..334572.00 rows=1 width=32) (actual time=188.019..284.259 rows=0 loops=3)
Filter: ((a = 1) AND (b = 1) AND (c = 1) AND (d = 1) AND (e = 1))
Rows Removed by Filter: 6666666
Planning Time: 0.311 ms
Execution Time: 291.055 ms
(8 rows)

postgres=# set parallel_setup_cost = 100000;
SET
postgres=# explain analyze select * from tdummy where a = 1 and b = 1 and c = 1 and d = 1 and e = 1;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
Gather (cost=100000.00..434572.10 rows=1 width=32) (actual time=0.547..292.466 rows=1 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on tdummy (cost=0.00..334572.00 rows=1 width=32) (actual time=188.160..285.308 rows=0 loops=3)
Filter: ((a = 1) AND (b = 1) AND (c = 1) AND (d = 1) AND (e = 1))
Rows Removed by Filter: 6666666
Planning Time: 0.275 ms
Execution Time: 292.500 ms
(8 rows)

postgres=# set parallel_setup_cost = 1000000;
SET
postgres=# explain analyze select * from tdummy where a = 1 and b = 1 and c = 1 and d = 1 and e = 1;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on tdummy (cost=506868.00..506872.02 rows=1 width=32) (actual time=54.085..54.092 rows=1 loops=1)
Recheck Cond: ((a = 1) AND (b = 1) AND (c = 1) AND (d = 1) AND (e = 1))
Rows Removed by Index Recheck: 1
Heap Blocks: exact=2
-> Bitmap Index Scan on bloomidx2 (cost=0.00..506868.00 rows=1 width=0) (actual time=54.047..54.047 rows=2 loops=1)
Index Cond: ((a = 1) AND (b = 1) AND (c = 1) AND (d = 1) AND (e = 1))
Planning Time: 0.165 ms
Execution Time: 54.847 ms
(8 rows)

Regards
Daniel

[1] https://www.postgresql.org/message-id/flat/ZR0P278MB0122119FAE78721A694C30C8D2340%40ZR0P278MB0122.CHEP278.PROD.OUTLOOK.COM

Responses

Re: Bloom filters and the planner / parallel execution at 2024-11-05 13:22:29 from Rafia Sabih

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Pavel Stehule	2024-10-09 04:24:56	Re: proposal: schema variables
Previous Message	Andrey Stikheev	2024-10-07 17:16:24	Re: Performance degradation in Index searches with special characters