Selection not "pushed down into" CTE

From: Clemens Eisserer <linuxhippy(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Selection not "pushed down into" CTE
Date: 2024-01-07 07:37:17
Message-ID: CAFvQSYSWH2mJbnh0A16TXppm71PsYF1E9pnTw7Vi751hgpKufg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi,

running postgresql 15.5 I was recently surpised postgresql didn't
perform an optimization which I thought would be easy to apply.
so in this case I don't have an actual performance problem but I am
rather curious if this is limitation in postgresql or whether there is
a semantic difference in the two queries below.

running the following query results in a full sort (caused by lead
over order by) as the ts > '2024-01-04' selection doesn't seem to be
applied to the CTE but only later:

with cte as (select ts, lead(ts, 1) over (order by ts) as ts2 from smartmeter)
select ts, ts2 from cte where ts > '2024-01-04' and extract(epoch
from ts2) - extract(epoch from ts) > 9;

--------
Subquery Scan on cte (cost=1116514.38..1419735.26 rows=253 width=16)
(actual time=117487.536..117999.668 rows=10 loops=1)
Filter: ((cte.ts > '2024-01-04 00:00:00+00'::timestamp with time
zone) AND ((EXTRACT(epoch FROM cte.ts2) - EXTRACT(epoch FROM cte.ts))
> '9'::numeric))
Rows Removed by Filter: 7580259
-> WindowAgg (cost=1116514.38..1249173.52 rows=7580522 width=16)
(actual time=67016.787..114141.495 rows=7580269 loops=1)
-> Sort (cost=1116514.38..1135465.69 rows=7580522 width=8)
(actual time=67016.685..81802.822 rows=7580269 loops=1)
Sort Key: smartmeter.ts
Sort Method: external merge Disk: 89024kB
-> Seq Scan on smartmeter (cost=0.00..146651.22
rows=7580522 width=8) (actual time=7.251..56715.002 rows=7580269
loops=1)
Planning Time: 0.502 ms
Execution Time: 118100.528 ms

whereas if ts > '2024-01-04' is already filtered in the CTE the query
performs a lot better:

with cte as (select ts, lead(ts, 1) over (order by ts) as ts2 from
smartmeter where ts > '2024-01-04')
select ts, ts2 from cte where extract(epoch from ts2) - extract(epoch
from ts) > 9;

-----------------------------------------------------------------------------------------------------------------------------------------------------
Subquery Scan on cte (cost=74905.42..74933.84 rows=253 width=16)
(actual time=334.654..804.286 rows=10 loops=1)
Filter: ((EXTRACT(epoch FROM cte.ts2) - EXTRACT(epoch FROM cte.ts))
> '9'::numeric)
Rows Removed by Filter: 57021
-> WindowAgg (cost=74905.42..74918.68 rows=758 width=16) (actual
time=263.950..550.566 rows=57031 loops=1)
-> Sort (cost=74905.42..74907.31 rows=758 width=8) (actual
time=263.893..295.188 rows=57031 loops=1)
Sort Key: smartmeter.ts
Sort Method: quicksort Memory: 1537kB
-> Bitmap Heap Scan on smartmeter
(cost=16.37..74869.16 rows=758 width=8) (actual time=170.485..243.591
rows=57031 loops=1)
Recheck Cond: (ts > '2024-01-04
00:00:00+00'::timestamp with time zone)
Rows Removed by Index Recheck: 141090
Heap Blocks: lossy=1854
-> Bitmap Index Scan on smartmeter_ts_idx
(cost=0.00..16.18 rows=76345 width=0) (actual time=1.142..1.144
rows=18540 loops=1)
Index Cond: (ts > '2024-01-04
00:00:00+00'::timestamp with time zone)
Planning Time: 0.565 ms
Execution Time: 804.474 ms
(15 rows)

Thanks a lot, Clemens

The DDL of the table in question is:

CREATE TABLE public.smartmeter (
leistungsfaktor real,
momentanleistung integer,
spannungl1 real,
spannungl2 real,
spannungl3 real,
stroml1 real,
stroml2 real,
stroml3 real,
wirkenergien real,
wirkenergiep real,
ts timestamp with time zone NOT NULL
);
CREATE INDEX smartmeter_ts_idx ON public.smartmeter USING brin (ts);

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2024-01-07 16:55:51 Re: Selection not "pushed down into" CTE
Previous Message Darwin Correa 2024-01-06 21:09:29 Re: Slow GroupAggregate and Sort