From: | "L(dot) Loewe" <lloewe(at)hotmail(dot)com> |
---|---|
To: | pgsql-novice(at)postgresql(dot)org |
Subject: | slow plan on join when adding where clause |
Date: | 2010-03-29 00:19:04 |
Message-ID: | op.vaa1x2utb4gxma@oakcreek2 |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-novice |
Hi all
I have a fairly large table ("main" ~50M rows) containing a timestamp
(indexed),
a sensor_id (indexed) and some sensor data ranging over a few years.
The second table ("events" <500 rows) contains a set of interesting events
with
an events.id and two timestamps: events.start and events.stop plus some
additional data.
Now I want to join these tables to get the sensor data for those events.
The interval between start and stop is quite short for each event (usually
a
couple of minutes) so that there aren't too many rows from table "main"
matching this criteria (~1K) for each event:
SELECT * FROM events, main WHERE main.ti > events.start and main.ti <
events.stop;
EXPLAIN ANALYZE gives
Nested Loop (cost=0.00..27621542.27 rows=524505120 width=68)
(actual time=0.038..42.314 rows=69209 loops=1)
-> Seq Scan on events (cost=0.00..2.24 rows=124 width=28)
(actual time=0.006..0.025 rows=124 loops=1)
-> Index Scan using idx_main_ti on main (cost=0.00..159306.16
rows=4229880 width=40)
(actual time=0.016..0.178 rows=558 loops=124)
Index Cond: ((main.ti > events.start) AND (main.ti < events.stop))
Total runtime: 47.682 ms
So far so good, however if I add a sensor_id constraint the planner
chooses a
different approach:
SELECT * FROM events, main WHERE main.ti > events.start and
main.ti < events.stop and sensor_id=1;
Nested Loop (cost=7309.32..1422246.30 rows=4795865 width=68)
(actual time=23427.599..23886.276 rows=772 loops=1)
Join Filter: ((main.ti > events.start) AND (main.ti < events.stop))
-> Bitmap Heap Scan on main (cost=7306.96..343174.23 rows=348087
width=40)
(actual time=3771.719..9508.728 rows=490984 loops=1)
Recheck Cond: (sensor_id= 1)
-> Bitmap Index Scan on idx_main_sens (cost=0.00..7219.94
rows=348087 width=0)
(actual time=3769.075..3769.075 rows=491102 loops=1)
Index Cond: (sensor_id= 1)
-> Materialize (cost=2.36..3.60 rows=124 width=28)
(actual time=0.000..0.010 rows=124 loops=490984)
-> Seq Scan on events (cost=0.00..2.24 rows=124 width=28)
(actual time=0.005..0.021 rows=124 loops=1)
Total runtime: 23886.494 ms
Apparently the planner assumes that sensor_id=1 will return fewer rows than
the time constraint while the opposite is true:
sensor_id=1 -> ~ 500K , time window -> ~ 1K.
Is there a way to hint the planner to use plan 1 even with the sensor_id=1
clause or am I doing something fundamentally wrong here?
Thanks
From | Date | Subject | |
---|---|---|---|
Next Message | Carel Combrink | 2010-03-29 07:21:19 | Return row after a Insert |
Previous Message | dipti shah | 2010-03-28 09:53:48 | Designing Postgres Security Model |