effect on planner of turning a subquery to a table, sql function returning table

From: Thierry Henrio <thierry(dot)henrio(at)gmail(dot)com>
To: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: effect on planner of turning a subquery to a table, sql function returning table
Date: 2024-04-12 10:33:06
Message-ID: CAMPYKo0CNfgU64VHtFe0pCP4_H9-bTXQGYKpdZLh8CcdWyGbyw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi there,

I work on a booking system.
Below is a query showing reservations of devices that overlaps a campaign.
A campaign has duration and time intervals by day of week.

Here is a query (A):

select
device_id,
t.date,
timerange(t.start_time, t.end_time) * g.times as times
from device_timeslots t
join (
select date, dow, timerange(start, stop) as times
from campaigns c
cross join generate_series(c.start_date, c.end_date, '1 day') d(date)
join unnest_timeslots(c.timeslots) t(dow, start, stop) on
t.dow=extract(dow from date)
where id=11870
) g on g.date=t.date
where t.date between '2024-04-26' and '2024-04-26'
group by device_id, t.date, t.start_time, t.end_time, g.times;

The time intervals of the campaign are stored as a jsonb (c.timeslots) and
expanded by an IMMUTABLE sql function unnest_timeslots returning a table.

The query (A) runs in 440.497 ms.

When I replace subquery with a temp table (B):

create temp table z11870 as (
select date, dow, timerange(start, stop) as times
from campaigns c
cross join generate_series(c.start_date, c.end_date, '1 day') d(date)
join unnest_timeslots(c.timeslots) t(dow, start, stop) on
t.dow=extract(dow from date)
where id=11870
);

select
device_id,
t.date,
timerange(t.start_time, t.end_time) * z.times as times
from device_timeslots t
join z11870 z on z.date=t.date
where t.date between '2024-04-26' and '2024-04-26'
group by device_id, t.date, t.start_time, t.end_time, g.times;

The query (B) runs in 48.160 ms.

Here is (B) execution plan:

GroupAggregate (cost=70121.37..71282.14 rows=33165 width=124)
Group Key: t.device_id, t.date, t.start_time, t.end_time, z.times
-> Sort (cost=70121.37..70204.28 rows=33165 width=64)
Sort Key: t.device_id, t.date, t.start_time, t.end_time, z.times,
t.rank
-> Merge Join (cost=67127.99..67631.11 rows=33165 width=64)
Merge Cond: (z.date = t.date)
-> Sort (cost=78.60..81.43 rows=1130 width=40)
Sort Key: z.date
-> Seq Scan on z11870 z (cost=0.00..21.30 rows=1130
width=40)
-> Sort (cost=67049.39..67109.04 rows=23861 width=32)
Sort Key: t.date
-> Bitmap Heap Scan on device_timeslots t
(cost=329.01..65314.41 rows=23861 width=32)
Recheck Cond: ((date >= '2024-04-26'::date) AND
(date <= '2024-04-26'::date))
-> Bitmap Index Scan on
device_timeslots_date_index (cost=0.00..323.05 rows=23861 width=0)
Index Cond: ((date >= '2024-04-26'::date)
AND (date <= '2024-04-26'::date))

, whereas the plan of (A) is:

GroupAggregate (cost=401037.82..503755.82 rows=1467400 width=124)
Group Key: t.device_id, t.date, t.start_time, t.end_time,
(timerange(((t_1.value ->> 0))::time without time zone, ((t_1.value ->>
1))::time without time zone))
-> Sort (cost=401037.82..404706.32 rows=1467400 width=96)
Sort Key: t.device_id, t.date, t.start_time, t.end_time,
(timerange(((t_1.value ->> 0))::time without time zone, ((t_1.value ->>
1))::time without time zone)), t.rank
-> Nested Loop (cost=2.99..100268.62 rows=1467400 width=96)
-> Nested Loop (cost=2.98..55962.20 rows=14674 width=64)
-> Nested Loop (cost=2.54..39.31 rows=500 width=40)
-> Index Scan using campaigns_pkey on campaigns
c (cost=0.28..8.30 rows=1 width=355)
Index Cond: (id = 11870)
-> Hash Join (cost=2.26..26.01 rows=500
width=40)
Hash Cond: (EXTRACT(dow FROM d.date) =
((j.dow)::integer)::numeric)
-> Function Scan on generate_series d
(cost=0.01..10.01 rows=1000 width=8)
-> Hash (cost=1.00..1.00 rows=100
width=64)
-> Function Scan on jsonb_each j
(cost=0.00..1.00 rows=100 width=64)
-> Index Scan using device_timeslots_date_index on
device_timeslots t (cost=0.43..111.56 rows=29 width=32)
Index Cond: ((date = d.date) AND (date >=
'2024-04-26'::date) AND (date <= '2024-04-26'::date))
-> Memoize (cost=0.01..1.01 rows=100 width=32)
Cache Key: j.times
Cache Mode: binary
-> Function Scan on jsonb_array_elements t_1
(cost=0.00..1.00 rows=100 width=32)

The Merge Join of (B) provides better timing than the Nested Loop of (A)...

On the options I think:

O1) change the design, add a table much like the z11870

O2) Is there a way to hint planner to materialize a subquery?

O3) other?

Cheers!

, Thierry

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Laura Smith 2024-04-12 10:43:52 Obsolete Linux downloads (Debian) instructions
Previous Message Adrian Klaver 2024-04-11 16:30:32 Re: Question on trigger