Re: BUG #18854: PostgreSQL chooses a suboptimal execution plan when using a specific WHERE filter

From: Yoni Sade <yonisade83(at)gmail(dot)com>
To: Andrei Lepikhov <lepihov(at)gmail(dot)com>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #18854: PostgreSQL chooses a suboptimal execution plan when using a specific WHERE filter
Date: 2025-03-19 13:34:48
Message-ID: CA+82DrV6g=R8VE_HT6jYSJEmCBsik1cPvzL62fK5h9WKXrz=nw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

I would like to add that commenting out that WHERE clause doesn't work for
all of this query variations so I've found another workaround for now
("LIMIT 1000000" in *bold*)

to make the optimizer better estimate the cardinality of the subquery and
change the plan to a better one:
*JOIN (SELECT *, generate_series(mv.start_year, mv.end_year) as year*
*FROM mv LIMIT 1000000) t3*

This is the previous (suboptimal) plan in the format you asked for (with
table/column names obfuscated):

Aggregate (cost=44501.78..44501.79 rows=1 width=8) (actual time=140784.531..
140784.633 rows=1 loops=1)

Output: count(*)

-> Group (cost=44501.48..44501.68 rows=8 width=679) (actual time=4656.869..
140784.563 rows=153 loops=1)

Output: NULL::integer, t1.***********, t1.***********, t3.***********,
t1.***********, (COALESCE(p.is_new, false)), t3.year

Group Key: t1.***********, t3.***********, (COALESCE(p.is_new, false)), t3.
year

-> Incremental Sort (cost=44501.48..44501.60 rows=8 width=675) (actual time
=4656.866..140784.160 rows=1994 loops=1)

Output: t1.***********, t3.***********, (COALESCE(p.is_new, false)),
t3.***********, t1.***********, t1.***********

Sort Key: t1.***********, t3.***********, (COALESCE(p.is_new, false)), t3.
year

Presorted Key: t1.m_id

Full-sort Groups: 24 Sort Method: quicksort Average Memory: 30kB Peak
Memory: 30kB

Pre-sorted Groups: 15 Sort Method: quicksort Average Memory: 34kB Peak
Memory: 34kB

-> Nested Loop Left Join (cost=29224.92..44501.36 rows=8 width=675) (actual
time=239.881..140777.492 rows=1994 loops=1)

Output: t1.***********, t3.***********, COALESCE(p.is_new, false),
t3.***********, t1.***********, t1.***********

-> Nested Loop (cost=29224.92..44486.38 rows=4 width=679) (actual time
=239.855..140771.693 rows=1994 loops=1)

Output: t1.***********, t1.***********, t1.***********, t3.***********,
t3.***********, t9.***********, t8.***********

Join Filter: (t3.mm_id = t4.mm_id)

Rows Removed by Join Filter: 33142274

-> Nested Loop Left Join (cost=2048.29..13628.25 rows=1 width=679) (actual
time=13.032..1942.680 rows=1994 loops=1)

Output: t1.***********, t1.***********, t1.***********, t3.***********,
t3.***********, t3.***********, t9.***********

-> Nested Loop (cost=2047.87..13623.80 rows=1 width=678) (actual time
=13.010..1906.470 rows=1650 loops=1)

Output: t1.***********, t1.***********, t1.***********, t3.***********,
t3.***********, t3.***********

Join Filter: (t1.m_id = t3.m_id)

Rows Removed by Join Filter: 632234

-> Nested Loop Left Join (cost=1047.87..1060.89 rows=1 width=552) (actual
time=11.398..19.784 rows=37 loops=1)

Output: t1.***********, t1.***********, t1.***********

Inner Unique: true

Join Filter: (t1.m_id = t2.m_id)

Rows Removed by Join Filter: 29232

Filter: COALESCE(((count(*)) > 0), false)

Rows Removed by Filter: 12

-> Index Scan using mds_mk_id_m_id_key on mds m (cost=0.27..8.29 rows=1
width=552) (actual time=0.023..0.223 rows=49 loops=1)

Output: t1.***********, t1.***********, t1.***********, t1.***********,
t1.***********, t1.***********, t1.***********, t1.***********,
t1.***********, t1.***********, t1.***********, t1.***********,
t1.***********

Index Cond: (t1.mk_id = 7)

-> HashAggregate (cost=1047.60..1049.60 rows=200 width=12) (actual time
=0.234..0.360 rows=597 loops=49)

Output: count(*), t2.m_id

Group Key: t2.m_id

Batches: 1 Memory Usage: 209kB

-> Seq Scan on t2 (cost=0.00..993.28 rows=10864 width=4) (actual time
=0.040..7.570 rows=20717 loops=1)

Output: t2.***********, t2.***********, t2.***********, t2.***********,
t2.***********, t2.***********, t2.***********, t2.***********,
t2.***********, t2.***********, t2.***********, t2.***********,
t2.***********, t2.***********, t2.***********, t2.***********,
t2.***********, t2.***********, t2.***********, t2.***********,
t2.***********, t2.***********, t2.***********, t2.***********,
t2.***********, t2.***********, t2.***********, t2.***********,
t2.***********

Filter: t2.is_active

Rows Removed by Filter: 11985

-> Subquery Scan on t3 (cost=1000.00..12562.73 rows=15 width=130) (actual
time=1.405..49.852 rows=17132 loops=37)

Output: t3.***********, t3.***********, t3.***********, t3.***********,
t3.***********, t3.***********, t3.***********, t3.***********,
t3.***********, t3.***********, t3.***********, t3.***********,
t3.***********, t3.***********, t3.***********, t3.***********,
t3.***********, t3.***********, t3.***********, t3.***********,
t3.***********, t3.***********, t3.***********, t3.***********,
t3.***********, t3.***********, t3.***********, t3.***********,
t3.***********, t3.***********, t3.***********, t3.***********,
t3.***********, t3.***********, t3.***********, t3.***********,
t3.***********, t3.***********, t3.***********, t3.***********,
t3.***********, t3.***********, t3.***********, t3.***********,
t3.***********, t3.***********, t3.***********, t3.***********,
t3.***********, t3.***********, t3.***********, t3.***********,
t3.***********, t3.***********

Filter: ((t3.year >= 2021) AND (t3.year <= 2024))

Rows Removed by Filter: 44378

-> Gather (cost=1000.00..12516.33 rows=3093 width=2335) (actual time=1.365..
44.739 rows=61510 loops=37)

Output: NULL::timestamp without time zone, NULL::timestamp without time zone,
mv.***********, NULL::text, NULL::integer, mv.***********, NULL::character
varying(50), mv.***********, NULL::character varying(255), NULL::integer,
NULL::character varying, NULL::integer, NULL::character varying(100), NULL::
integer, NULL::character varying(50), NULL::integer, NULL::character varying
(50), NULL::integer, NULL::character varying(50), NULL::integer, NULL::
character varying(50), NULL::integer, NULL::integer, NULL::integer, NULL::
character varying(50), NULL::integer, NULL::character varying(50), NULL::
integer, NULL::character varying(50), NULL::integer, NULL::character varying
(50), NULL::integer, NULL::text, NULL::text, NULL::integer, NULL::integer,
NULL::integer, NULL::integer, NULL::boolean, NULL::boolean, NULL::boolean,
NULL::boolean, NULL::integer[], NULL::text[], NULL::text[], NULL::text, NULL
::text, NULL::boolean, NULL::integer, NULL::integer, NULL::integer, NULL::
timestamp without time zone, NULL::timestamp without time zone, (
generate_series(mv.s_year, mv.e_year))

Workers Planned: 1

Workers Launched: 1

-> ProjectSet (cost=0.00..11207.03 rows=1819000 width=2335) (actual time
=0.034..25.766 rows=30755 loops=74)

Output: NULL::timestamp without time zone, NULL::timestamp without time zone,
mv.***********, NULL::text, NULL::integer, mv.***********, NULL::character
varying(50), mv.***********, NULL::character varying(255), NULL::integer,
NULL::character varying, NULL::integer, NULL::character varying(100), NULL::
integer, NULL::character varying(50), NULL::integer, NULL::character varying
(50), NULL::integer, NULL::character varying(50), NULL::integer, NULL::
character varying(50), NULL::integer, NULL::integer, NULL::integer, NULL::
character varying(50), NULL::integer, NULL::character varying(50), NULL::
integer, NULL::character varying(50), NULL::integer, NULL::character varying
(50), NULL::integer, NULL::text, NULL::text, NULL::integer, NULL::integer,
NULL::integer, NULL::integer, NULL::boolean, NULL::boolean, NULL::boolean,
NULL::boolean, NULL::integer[], NULL::text[], NULL::text[], NULL::text, NULL
::text, NULL::boolean, NULL::integer, NULL::integer, NULL::integer, NULL::
timestamp without time zone, NULL::timestamp without time zone,
generate_series(mv.s_year, mv.e_year)

Worker 0: actual time=0.019..33.874 rows=40139 loops=37

-> Parallel Seq Scan on mv (cost=0.00..2098.39 rows=1819 width=134) (actual
time=0.029..10.233 rows=10358 loops=74)

Output: mv.***********, mv.***********, mv.***********, mv.***********,
mv.***********, mv.***********, mv.***********, mv.***********,
mv.***********, mv.***********, mv.***********, mv.***********,
mv.***********, mv.***********, mv.***********, mv.***********,
mv.***********, mv.***********, mv.***********, mv.***********,
mv.***********, mv.***********, mv.***********, mv.***********,
mv.***********, mv.***********, mv.***********, mv.***********,
mv.***********, mv.***********, mv.***********, mv.***********,
mv.***********, mv.***********, mv.***********, mv.***********,
mv.***********, mv.***********, mv.***********, mv.***********,
mv.***********, mv.***********, mv.***********, mv.***********,
mv.***********, mv.***********, mv.***********, mv.***********,
mv.***********, mv.***********, mv.***********, mv.***********,
mv.***********

Filter: mv.is_active

Rows Removed by Filter: 5992

Worker 0: actual time=0.016..13.530 rows=13204 loops=37

-> Index Only Scan using t9_un on t9 (cost=0.42..4.44 rows=1 width=9)
(actual time=0.014..0.018 rows=1 loops=1650)

Output: t9.***********, t9.***********, t9.***********

Index Cond: ((p.mm_id = t3.mm_id) AND (p.year = t3.year))

Heap Fetches: 0

-> Hash Join (cost=27176.63..30848.49 rows=771 width=8) (actual time=0.084..
68.596 rows=16622 loops=1994)

Output: t4.***********, t8.sm_f_id

Hash Cond: (t4.f_id = t8.mm_f_id)

-> Unique (cost=27175.38..28718.18 rows=154280 width=126) (actual time
=0.084..50.368 rows=166526 loops=1994)

Output: t4.***********, t4.***********, t7.***********

-> Sort (cost=27175.38..27561.08 rows=154280 width=126) (actual time=0.083..
19.173 rows=166526 loops=1994)

Output: t4.***********, t4.***********, t7.***********

Sort Key: t4.***********, t4.***********, t7.***********

Sort Method: external sort Disk: 6912kB

-> Append (cost=16.07..3859.65 rows=154280 width=126) (actual time=0.025..
67.960 rows=166526 loops=1)

-> Hash Left Join (cost=16.07..773.18 rows=38220 width=126) (actual time
=0.024..12.886 rows=40136 loops=1)

Output: t4.***********, t4.***********, t7.***********

Inner Unique: true

Hash Cond: (t4.f_id = t7.f_id)

-> Seq Scan on t4 (cost=0.00..655.20 rows=38220 width=8) (actual time
=0.005..3.407 rows=40136 loops=1)

Output: t4.***********, t4.***********, t4.***********, t4.***********,
t4.***********, t4.***********, t4.***********

-> Hash (cost=12.70..12.70 rows=270 width=122) (actual time=0.013..0.014
rows=17 loops=1)

Output: t7.***********, t7.***********

Buckets: 1024 Batches: 1 Memory Usage: 9kB

-> Seq Scan on t7 (cost=0.00..12.70 rows=270 width=122) (actual time=0.004..
0.008 rows=17 loops=1)

Output: t7.***********, t7.***********

-> Hash Left Join (cost=15.85..2315.07 rows=116060 width=126) (actual time
=0.036..41.281 rows=126390 loops=1)

Output: t10.***********, t10.***********, t5.************

Inner Unique: true

Hash Cond: (t10.t_id = t5.t_id)

-> Seq Scan on t10 (cost=0.00..1989.60 rows=116060 width=8) (actual time
=0.012..10.756 rows=126390 loops=1)

Output: t10.***********, t10.***********, t10.***********, t10.***********,
t10.***********, t10.***********, t10.***********

-> Hash (cost=12.60..12.60 rows=260 width=122) (actual time=0.013..0.014
rows=16 loops=1)

Output: t5.***********, t5.***********

Buckets: 1024 Batches: 1 Memory Usage: 9kB

-> Seq Scan on t5 (cost=0.00..12.60 rows=260 width=122) (actual time=0.005..
0.008 rows=16 loops=1)

Output: t5.***********, t5.***********

-> Hash (cost=1.24..1.24 rows=1 width=8) (actual time=0.011..0.011 rows=1
loops=1)

Output: t8.***********, t8.***********

Buckets: 1024 Batches: 1 Memory Usage: 9kB

-> Seq Scan on t8 (cost=0.00..1.24 rows=1 width=8) (actual time=0.008..0.008
rows=1 loops=1)

Output: t8.***********, t8.***********

Filter: (t8.sm_f_id = 8)

Rows Removed by Filter: 18

-> Materialize (cost=0.00..14.88 rows=2 width=4) (actual time=0.001..0.001
rows=1 loops=1994)

Output: t7.***********

-> Seq Scan on t7 (cost=0.00..14.88 rows=2 width=4) (actual time=0.017..
0.019 rows=1 loops=1)

Output: t7.***********

Filter: (t7.f_id = 8)

Rows Removed by Filter: 9

Query Identifier: -4992889262453896709

Planning Time: 1.762 ms

Execution Time: 140786.468 ms

‫בתאריך יום ג׳, 18 במרץ 2025 ב-16:16 מאת ‪Andrei Lepikhov‬‏ <‪
lepihov(at)gmail(dot)com‬‏>:‬

> On 3/18/25 12:34, PG Bug reporting form wrote:
> > Query has a specific WHERE filter for tiny lookup table "t8" (19 rows)
> which
> > generates an execution plan which has two "Nested Loop" steps instead of
> > "Hash Join" steps which makes the query finish after 90 seconds (and
> spiking
> > the server CPU when being ran by dozens of sessions simultaneously)
> instead
> > of ~3 seconds (when commenting out that specific WHERE filter clause I
> > mentioned below):
> May you provide EXPLAIN ANALYZE VERBOSE in standard text output format?
>
> --
> regards, Andrei Lepikhov
>

--
בברכה,
יוני שדה

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Tomas Vondra 2025-03-19 13:43:34 Re: BUG #18855: Using BRIN index with int8_bloom_ops produces incorrect results
Previous Message Inzamam Shafiq 2025-03-19 12:49:14 Possible bug acldefault function