From: | Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com> |
---|---|
To: | Ow Mun Heng <Ow(dot)Mun(dot)Heng(at)wdc(dot)com> |
Cc: | pgsql-general <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: Evil Nested Loops |
Date: | 2009-06-03 09:23:13 |
Message-ID: | dcc563d10906030223h2d2c5793w2d33940f065b6f43@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Wed, Jun 3, 2009 at 2:18 AM, Ow Mun Heng <Ow(dot)Mun(dot)Heng(at)wdc(dot)com> wrote:
> On Wed, 2009-06-03 at 01:28 -0600, Scott Marlowe wrote:
>> On Wed, Jun 3, 2009 at 12:32 AM, Ow Mun Heng <Ow(dot)Mun(dot)Heng(at)wdc(dot)com> wrote:
>> > HashAggregate (cost=8035443.21..8035445.17 rows=157 width=24)
>> > -> Nested Loop (cost=37680.95..7890528.72 rows=28982898 width=24) <<<<< suspect
>> > Join Filter: ((a.test_run_start_date_time >= date.start_time) AND (a.test_run_start_date_time <= date.end_time))
>> > -> Bitmap Heap Scan on d_trh_pbert a (cost=37677.22..1369372.99 rows=1661440 width=24)
>> > Recheck Cond: ((test_run_start_date_time >= '2009-05-08 07:00:00'::timestamp without time zone) AND (test_run_start_date_time <= '2009-05-15 06:59:59'::timestamp without time zone))
>> > -> Bitmap Index Scan on idx_d_trh_pbert_sdate (cost=0.00..37261.86 rows=1661440 width=0)
>> > Index Cond: ((test_run_start_date_time >= '2009-05-08 07:00:00'::timestamp without time zone) AND (test_run_start_date_time <= '2009-05-15 06:59:59'::timestamp without time zone))
>> > -> Materialize (cost=3.73..5.30 rows=157 width=24)
>> > -> Seq Scan on lookup_ww_date2 date (cost=0.00..3.57 rows=157 width=24)
>
>> OK, looking at your query and the plan, what you're doing is kind of this:
>>
>> 157 Rows times 1661440 Rows (cross product) = 260M or so and then you
>> filter out the 157 original rows and their matches. Note that an
>> explain ANALYZE might shed more light, but given the high cost in this
>> query for the nested loop I'm guessing the only thing you can do is
>> throw more work_mem at it. But it's fundamentally flawed in design I
>> think.
>
> The explain analyze runs >10 mins and then I just aborted it.
>
> WW49 is basically between 5/8 and 5/15, unfortunately, it's not actually
> just the base dates, it's also the time.
>
> eg: 200949|5/8/2009 7:00:00am|5/15/2009 6:59:59AM
>
> The definition of WW or a "day" is actually between
> eg: 5/8 7am to 5/9 6:59:59am
>
>
>> If you're always working with dates maybe joining on
>> date_trunc('day',test_run_start_date)=date_trunc('day',startdate')
>> with an index on both terms will work?
Well, if you could transform your 24 hour day to match date_trunc OR
write your own stable / immutable function to break it on those times.
I think it could be done. You could then get a useful index on it.
I think you'd have to use timestamps and not timestamps with timezones
to make it immutable.
Anyway, I think that would get rid of that huge nested loop.
From | Date | Subject | |
---|---|---|---|
Next Message | Sebastien FLAESCH | 2009-06-03 09:42:28 | Need for official 0.83 release |
Previous Message | Ow Mun Heng | 2009-06-03 08:18:08 | Re: Evil Nested Loops |