Quick Links

Re: Strange runtime partition pruning behaviour with 11.4

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Andreas Kretschmer <andreas(at)a-kretschmer(dot)de>
Cc:	pgsql-performance(at)lists(dot)postgresql(dot)org
Subject:	Re: Strange runtime partition pruning behaviour with 11.4
Date:	2019-08-03 16:05:53
Message-ID:	14075.1564848353@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

Andreas Kretschmer <andreas(at)a-kretschmer(dot)de> writes:
> Am 03.08.19 um 16:06 schrieb Thomas Kellerer:
>> But I'm more confused (or concerned) by the fact that the (original)
>> query works correctly *without* statistics.

> can't reproduce that :-( (PG 11.4 Community)

Yeah, I get the same plan with or without ANALYZE, too. In this example,
having the ANALYZE stats barely moves the rowcount estimates for
foo_bar_baz at all, so it's not surprising that the plan doesn't change.
(I do wonder how Thomas got a different outcome...)

Given the shape of the preferred plan:

Finalize Aggregate (cost=15779.59..15779.60 rows=1 width=8) (actual time=160.329..160.330 rows=1 loops=1)
-> Gather (cost=15779.38..15779.59 rows=2 width=8) (actual time=160.011..161.712 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=14779.38..14779.39 rows=1 width=8) (actual time=154.675..154.675 rows=1 loops=3)
-> Hash Join (cost=1.09..14612.90 rows=66590 width=0) (actual time=86.814..144.793 rows=100500 loops=3)
Hash Cond: (fbb_1.foo_id = foo.foo_id)
-> Parallel Append (cost=0.00..12822.21 rows=399537 width=4) (actual time=0.019..95.644 rows=318950 loops=3)
-> Parallel Seq Scan on foo_bar_baz_1 fbb_1 (cost=0.00..3403.53 rows=177353 width=4) (actual time=0.012..18.881 rows=100500 loops=3)
-> Parallel Seq Scan on foo_bar_baz_2 fbb_2 (cost=0.00..3115.53 rows=162353 width=4) (actual time=0.018..51.716 rows=276000 loops=1)
-> Parallel Seq Scan on foo_bar_baz_3 fbb_3 (cost=0.00..2031.82 rows=105882 width=4) (actual time=0.011..16.854 rows=90000 loops=2)
-> Parallel Seq Scan on foo_bar_baz_4 fbb_4 (cost=0.00..1584.00 rows=82500 width=4) (actual time=0.011..26.950 rows=140250 loops=1)
-> Parallel Seq Scan on foo_bar_baz_5 fbb_5 (cost=0.00..667.65 rows=34765 width=4) (actual time=0.014..11.896 rows=59100 loops=1)
-> Parallel Seq Scan on foo_bar_baz_0 fbb (cost=0.00..22.00 rows=1200 width=4) (actual time=0.001..0.001 rows=0 loops=1)
-> Hash (cost=1.07..1.07 rows=1 width=4) (actual time=0.038..0.038 rows=1 loops=3)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on foo (cost=0.00..1.07 rows=1 width=4) (actual time=0.021..0.023 rows=1 loops=3)
Filter: ((foo_name)::text = 'eeny'::text)
Rows Removed by Filter: 5

it's obvious that no pruning can happen, run-time or otherwise,
because the partitioned table is being scanned on the outside
of the join --- so the target value of foo_id isn't available.

We can force the planner to its second best choice with
set enable_hashjoin to 0;

and then we get

Aggregate (cost=31954.09..31954.10 rows=1 width=8) (actual time=420.158..420.158 rows=1 loops=1)
-> Nested Loop (cost=0.00..31554.55 rows=159815 width=0) (actual time=0.058..389.974 rows=301500 loops=1)
Join Filter: (fbb.foo_id = foo.foo_id)
Rows Removed by Join Filter: 655350
-> Seq Scan on foo (cost=0.00..1.07 rows=1 width=4) (actual time=0.025..0.028 rows=1 loops=1)
Filter: ((foo_name)::text = 'eeny'::text)
Rows Removed by Filter: 5
-> Append (cost=0.00..19567.35 rows=958890 width=4) (actual time=0.026..280.510 rows=956850 loops=1)
-> Seq Scan on foo_bar_baz_0 fbb (cost=0.00..30.40 rows=2040 width=4) (actual time=0.003..0.003 rows=0 loops=1)
-> Seq Scan on foo_bar_baz_1 fbb_1 (cost=0.00..4645.00 rows=301500 width=4) (actual time=0.022..57.836 rows=301500 loops=1)
-> Seq Scan on foo_bar_baz_2 fbb_2 (cost=0.00..4252.00 rows=276000 width=4) (actual time=0.019..51.834 rows=276000 loops=1)
-> Seq Scan on foo_bar_baz_3 fbb_3 (cost=0.00..2773.00 rows=180000 width=4) (actual time=0.016..31.951 rows=180000 loops=1)
-> Seq Scan on foo_bar_baz_4 fbb_4 (cost=0.00..2161.50 rows=140250 width=4) (actual time=0.015..24.392 rows=140250 loops=1)
-> Seq Scan on foo_bar_baz_5 fbb_5 (cost=0.00..911.00 rows=59100 width=4) (actual time=0.012..10.252 rows=59100 loops=1)

This is a good deal slower, and the planner correctly estimates that it's
a good deal slower, so that's why it didn't get picked.

But ... why didn't any run-time pruning happen? Because the shape of the
plan is still wrong: the join condition is being applied at the nestloop
node. If we'd pushed down the foo_id condition to the foo_bar_baz scan
then there'd be hope of pruning.

I think the reason that that isn't happening is that the planner has
not been taught that run-time pruning is a thing, so it's not giving
any cost preference to doing things in a way that would enable that.
It's not entirely clear what the cost estimate adjustments should be,
but obviously somebody had better work on that.

regards, tom lane

In response to

Re: Strange runtime partition pruning behaviour with 11.4 at 2019-08-03 15:18:19 from Andreas Kretschmer

Responses

Re: Strange runtime partition pruning behaviour with 11.4 at 2019-08-05 07:29:33 from Thomas Kellerer

Browse pgsql-performance by date

	From	Date	Subject
Next Message	MichaelDBA	2019-08-03 16:49:14	Re: Strange runtime partition pruning behaviour with 11.4
Previous Message	Andreas Kretschmer	2019-08-03 15:18:19	Re: Strange runtime partition pruning behaviour with 11.4