Re: Query run in 27s with 15.2 vs 37ms with 14.6

From: Charles <peacech(at)gmail(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: Query run in 27s with 15.2 vs 37ms with 14.6
Date: 2023-02-21 01:54:33
Message-ID: CABthHP-maAwqVEUfvMK0sWp3oaPQuOg=+fRj5hJGFBXNBDJX7A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Tue, Feb 21, 2023 at 3:58 AM Stephen Frost <sfrost(at)snowman(dot)net> wrote:

> Greetings,
>
> * Charles (peacech(at)gmail(dot)com) wrote:
> > Wrapping the query with a select * from (...) t where length(code) = 4
> puts
> > the execution time back to 27 seconds.
> >
> > This is a bit unexpected since I expect that the result from the inner
> > query to be executed first and then filtered.
>
> It's really not- PG will (correctly) attempt to pull in such subselects
> into the overall optimization, which is generally better for everyone.
> If you want to force it, you can use a WITH MATERIALIZED CTE, or throw
> in an 'OFFSET 0' as a hack into your sub-select, but really it's a much
> better idea to generate extended stats on what you're filtering as has
> been suggested, or come up with a better data representation where
> you're not doing a search on a 'length()' as you are.
>
>
>
Generating extended statistics on the expression (length(code)) causes the
planner to generate has join which runs in 183ms (still slower than 37ms on
14.6). Using materialized cte (no extended stats) results in nested loop
that runs in 229ms. I think I'll revert back to postgresql 14 for now.
Thank you for the suggestions.

with extended stats:
Hash Join (cost=65865.64..69478.05 rows=244 width=89) (actual
time=171.915..183.674 rows=779 loops=1)
Hash Cond: (s.code = va.code)
-> Bitmap Heap Scan on stock_price s (cost=16.12..3624.83 rows=244
width=29) (actual time=0.079..0.238 rows=779 loops=1)
Recheck Cond: (date = '2023-02-20'::date)
Filter: ((open > 0) AND (value > 0) AND (length(code) = 4))
Rows Removed by Filter: 210
Heap Blocks: exact=34
-> Bitmap Index Scan on idx_stock_price_date (cost=0.00..16.06
rows=1017 width=0) (actual time=0.042..0.042 rows=1827 loops=1)
Index Cond: (date = '2023-02-20'::date)
-> Hash (cost=65836.90..65836.90 rows=1010 width=37) (actual
time=171.827..183.109 rows=907 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 49kB
-> Subquery Scan on va (cost=65563.34..65836.90 rows=1010
width=37) (actual time=169.957..182.987 rows=907 loops=1)
-> Finalize GroupAggregate (cost=65563.34..65826.80
rows=1010 width=37) (actual time=169.957..182.947 rows=907 loops=1)
Group Key: stock_price.code
InitPlan 1 (returns $0)
-> Limit (cost=0.53..0.58 rows=1 width=4) (actual
time=0.012..0.012 rows=1 loops=1)
-> Index Only Scan Backward using
stock_date_pkey on stock_date (cost=0.28..363.63 rows=7359 width=4)
(actual time=0.010..0.011 rows=6 loops=1)
Heap Fetches: 6
-> Gather Merge (cost=65562.76..65798.45 rows=2020
width=37) (actual time=169.947..181.671 rows=2616 loops=1)
Workers Planned: 2
Params Evaluated: $0
Workers Launched: 2
-> Sort (cost=64562.74..64565.26 rows=1010
width=37) (actual time=156.536..156.575 rows=872 loops=3)
Sort Key: stock_price.code
Sort Method: quicksort Memory: 94kB
Worker 0: Sort Method: quicksort Memory:
93kB
Worker 1: Sort Method: quicksort Memory:
90kB
-> Partial HashAggregate
(cost=64499.71..64512.34 rows=1010 width=37) (actual time=156.002..156.246
rows=872 loops=3)
Group Key: stock_price.code
Batches: 1 Memory Usage: 577kB
Worker 0: Batches: 1 Memory Usage:
577kB
Worker 1: Batches: 1 Memory Usage:
577kB
-> Parallel Seq Scan on stock_price
(cost=0.00..64038.54 rows=92234 width=13) (actual time=68.301..155.471
rows=1427 loops=3)
Filter: ((value > 0) AND (date
> $0))
Rows Removed by Filter: 906975
Planning Time: 0.340 ms
Execution Time: 183.833 ms

materialized cte:
Nested Loop (cost=65839.48..69480.90 rows=1 width=89) (actual
time=156.363..229.143 rows=779 loops=1)
Join Filter: (s.code = va.code)
Rows Removed by Join Filter: 705774
CTE vol_avg
-> Finalize GroupAggregate (cost=65561.00..65823.42 rows=1006
width=37) (actual time=154.575..156.254 rows=907 loops=1)
Group Key: stock_price.code
InitPlan 1 (returns $0)
-> Limit (cost=0.53..0.58 rows=1 width=4) (actual
time=0.013..0.013 rows=1 loops=1)
-> Index Only Scan Backward using stock_date_pkey on
stock_date (cost=0.28..363.63 rows=7359 width=4) (actual time=0.011..0.012
rows=6 loops=1)
Heap Fetches: 6
-> Gather Merge (cost=65560.42..65795.17 rows=2012 width=37)
(actual time=154.564..155.143 rows=2625 loops=1)
Workers Planned: 2
Params Evaluated: $0
Workers Launched: 2
-> Sort (cost=64560.40..64562.92 rows=1006 width=37)
(actual time=138.777..138.822 rows=875 loops=3)
Sort Key: stock_price.code
Sort Method: quicksort Memory: 92kB
Worker 0: Sort Method: quicksort Memory: 93kB
Worker 1: Sort Method: quicksort Memory: 93kB
-> Partial HashAggregate (cost=64497.65..64510.23
rows=1006 width=37) (actual time=138.193..138.455 rows=875 loops=3)
Group Key: stock_price.code
Batches: 1 Memory Usage: 577kB
Worker 0: Batches: 1 Memory Usage: 577kB
Worker 1: Batches: 1 Memory Usage: 577kB
-> Parallel Seq Scan on stock_price
(cost=0.00..64038.54 rows=91822 width=13) (actual time=133.465..137.738
rows=1427 loops=3)
Filter: ((value > 0) AND (date > $0))
Rows Removed by Filter: 906975
-> Bitmap Heap Scan on stock_price s (cost=16.06..3624.77 rows=1
width=29) (actual time=0.071..0.570 rows=779 loops=1)
Recheck Cond: (date = '2023-02-20'::date)
Filter: ((open > 0) AND (value > 0) AND (length(code) = 4))
Rows Removed by Filter: 210
Heap Blocks: exact=34
-> Bitmap Index Scan on idx_stock_price_date (cost=0.00..16.06
rows=1017 width=0) (actual time=0.036..0.036 rows=1827 loops=1)
Index Cond: (date = '2023-02-20'::date)
-> CTE Scan on vol_avg va (cost=0.00..20.12 rows=1006 width=64) (actual
time=0.198..0.244 rows=907 loops=779)
Planning Time: 0.499 ms
Execution Time: 229.381 ms

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Stephen Frost 2023-02-21 02:02:50 Re: Query run in 27s with 15.2 vs 37ms with 14.6
Previous Message Charles 2023-02-21 01:38:27 Re: Query run in 27s with 15.2 vs 37ms with 14.6