From: | "Daniel Westermann (DWE)" <daniel(dot)westermann(at)dbi-services(dot)com> |
---|---|
To: | Pg Docs <pgsql-docs(at)lists(dot)postgresql(dot)org> |
Subject: | Wrong example in the bloom documentation |
Date: | 2020-09-27 15:58:14 |
Message-ID: | ZR0P278MB0122119FAE78721A694C30C8D2340@ZR0P278MB0122.CHEP278.PROD.OUTLOOK.COM |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-docs pgsql-hackers |
Hi,
I've briefly discussed this with Bruce some time ago in [1].
Replaying the example referenced in the documentation does not give a Bitmap Heap Scan on tbloom but a parallel seq scan with the default configuration:
-- tested on head
postgres=# CREATE TABLE tbloom AS
postgres-# SELECT
postgres-# (random() * 1000000)::int as i1,
postgres-# (random() * 1000000)::int as i2,
postgres-# (random() * 1000000)::int as i3,
postgres-# (random() * 1000000)::int as i4,
postgres-# (random() * 1000000)::int as i5,
postgres-# (random() * 1000000)::int as i6
postgres-# FROM
postgres-# generate_series(1,10000000);
SELECT 10000000
postgres=# CREATE INDEX bloomidx ON tbloom USING bloom (i1, i2, i3, i4, i5, i6);
CREATE INDEX
postgres=# CREATE index btreeidx ON tbloom (i1, i2, i3, i4, i5, i6);
CREATE INDEX
postgres=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Gather (cost=1000.00..127220.00 rows=250 width=24) (actual time=2134.851..2221.836 rows=0 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on tbloom (cost=0.00..126195.00 rows=104 width=24) (actual time=1770.691..1770.692 rows=0 loops=3)
Filter: ((i2 = 898732) AND (i5 = 123451))
Rows Removed by Filter: 3333333
Planning Time: 0.895 ms
JIT:
Functions: 6
Options: Inlining false, Optimization false, Expressions true, Deforming true
Timing: Generation 65.512 ms, Inlining 0.000 ms, Optimization 46.328 ms, Emission 40.658 ms, Total 152.499 ms
Execution Time: 2288.056 ms
(12 rows)
As bloom was introduced in 9.6 I quickly tried with 9.6.17 and indeed for this version the example is correct:
postgres=# select version();
version
----------------------------------------------------------------------------------------------------------
PostgreSQL 9.6.17 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.3.1 20190507 (Red Hat 8.3.1-4), 64-bit
(1 row)
postgres=# CREATE TABLE tbloom AS
postgres-# SELECT
postgres-# (random() * 1000000)::int as i1,
postgres-# (random() * 1000000)::int as i2,
postgres-# (random() * 1000000)::int as i3,
postgres-# (random() * 1000000)::int as i4,
postgres-# (random() * 1000000)::int as i5,
postgres-# (random() * 1000000)::int as i6
postgres-# FROM
postgres-# generate_series(1,10000000);
SELECT 10000000
postgres=# CREATE INDEX bloomidx ON tbloom USING bloom (i1, i2, i3, i4, i5, i6);
CREATE INDEX
postgres=# CREATE index btreeidx ON tbloom (i1, i2, i3, i4, i5, i6);
CREATE INDEX
postgres=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on tbloom (cost=178436.06..179392.83 rows=250 width=24) (actual time=2279.363..2279.363 rows=0 loops=1)
Recheck Cond: ((i2 = 898732) AND (i5 = 123451))
Rows Removed by Index Recheck: 2329
Heap Blocks: exact=2288
-> Bitmap Index Scan on bloomidx (cost=0.00..178436.00 rows=250 width=0) (actual time=994.406..994.406 rows=2329 loops=1)
Index Cond: ((i2 = 898732) AND (i5 = 123451))
Planning time: 282.059 ms
Execution time: 2286.138 ms
(8 rows)
The reason is that parallel execution is disabled by default in 9.6, and if that is turned on the plan changes there as well:
postgres=# set max_parallel_workers_per_gather = 2;
SET
postgres=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
Gather (cost=1000.00..127194.29 rows=1 width=24) (actual time=1148.047..1148.206 rows=0 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on tbloom (cost=0.00..126194.19 rows=1 width=24) (actual time=1039.501..1039.501 rows=0 loops=3)
Filter: ((i2 = 898732) AND (i5 = 123451))
Rows Removed by Filter: 3333333
Planning time: 0.580 ms
Execution time: 1148.247 ms
(8 rows)
Starting with PostgreSQL 10 the example in the documentation is therefore wrong. Attached a proposal to fix this. The new example starts with 100x reduced rows (as suggested by Bruce in [1] and adds a note that the behavior changes as soon as parallel execution is cheaper than the index access.
Thoughts?
Regards
Daniel
Attachment | Content-Type | Size |
---|---|---|
bloom-doc-fix-v01.patch | text/x-patch | 8.8 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Laurenz Albe | 2020-09-28 08:50:57 | Re: Possible mistake in backup documentation |
Previous Message | Tom Lane | 2020-09-26 18:38:50 | Re: ssl file permission |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2020-09-27 15:59:26 | Re: Get rid of runtime handling of AlternativeSubPlan? |
Previous Message | Esteban Zimanyi | 2020-09-27 14:00:37 | Re: Fwd: Extending range type operators to cope with elements |