From: | Didier Carlier <didier(dot)carlier(at)haulogy(dot)net> |
---|---|
To: | pgsql-general(at)lists(dot)postgresql(dot)org |
Subject: | Finding out why parallel queries not avoided |
Date: | 2018-07-21 08:15:25 |
Message-ID: | 086587D2-BF35-4919-B4F5-FAAA169E053E@haulogy.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
I’m trying to find out why parallel queries are sometimes not used.
For example, I have 2 tables, calendar (1 row per day, ~3K rows) and measure (~300M rows) which includes a FK to calendar.
I.e knowing two day numbers, I can find out how many measures there are between these two dates with a
select count(*) from measure m where m.fromdateid >=1462 and m.fromdateid < 1826;
(1462 and 1826 are the calendar ids corresponding to 2015-01-01 and 2015-12-31)
This uses parallel query:
explain select count(*) from measure m where m.fromdateid >=1462 and m.fromdateid < 1826;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=3894860.64..3894860.65 rows=1 width=8)
-> Gather (cost=3894860.61..3894860.62 rows=8 width=8)
Workers Planned: 8
-> Partial Aggregate (cost=3894860.61..3894860.62 rows=1 width=8)
-> Parallel Bitmap Heap Scan on measure m (cost=11265.96..3881068.52 rows=5516835 width=0)
Recheck Cond: ((fromdateid >= 1462) AND (fromdateid < 1826))
-> Bitmap Index Scan on idx_measure_fromdate (cost=0.00..232.29 rows=44134699 width=0)
Index Cond: ((fromdateid >= 1462) AND (fromdateid < 1826))
The “equivalent" query without hard coding the day numbers gives this query plan:
explain select count(*) from calendar c1, calendar c2, measure m where
c1.stddate='2015-01-01' and c2.stddate='2015-12-31' and m.fromdateid >=c1.calendarid and m.fromdateid < c2.calendarid;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------
Aggregate (cost=5073362.73..5073362.74 rows=1 width=8)
-> Nested Loop (cost=8718.47..4988195.81 rows=34066770 width=0)
-> Index Scan using calendar_stddate_unique on calendar c2 (cost=0.28..2.30 rows=1 width=4)
Index Cond: (stddate = '2015-12-31 00:00:00+01'::timestamp with time zone)
-> Nested Loop (cost=8718.19..4647525.81 rows=34066770 width=4)
-> Index Scan using calendar_stddate_unique on calendar c1 (cost=0.28..2.30 rows=1 width=4)
Index Cond: (stddate = '2015-01-01 00:00:00+01'::timestamp with time zone)
-> Bitmap Heap Scan on measure m (cost=8717.91..4306855.81 rows=34066770 width=4)
Recheck Cond: ((fromdateid >= c1.calendarid) AND (fromdateid < c2.calendarid))
-> Bitmap Index Scan on idx_measure_fromdate (cost=0.00..201.22 rows=34072527 width=0)
Index Cond: ((fromdateid >= c1.calendarid) AND (fromdateid < c2.calendarid))
Both queries return the same answers but I don't see why the second one doesn't use parallel query.
I've tried a few different ways to express the same thing, e.g subselect, CTE etc in order to try to ease the query planner work but it always avoids the parallel query.
I also set the parallel_tuple_cost and parallel_setup_cost to 0 without success.
Any idea ? Or is there a way to ask the query planner more details about the decisions it makes ?
Kind regards,
Didier
From | Date | Subject | |
---|---|---|---|
Next Message | David Rowley | 2018-07-22 03:45:49 | Re: Finding out why parallel queries not avoided |
Previous Message | Alvaro Herrera | 2018-07-21 06:36:55 | Re: In certain cases, can UPDATE transactions fail rather than block waiting for “FOR UPDATE lock”? |