Re: how to change the provoke table in hash join

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Matheus de Oliveira <matioli(dot)matheus(at)gmail(dot)com>
Cc: "Huang, Suya" <Suya(dot)Huang(at)au(dot)experian(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: how to change the provoke table in hash join
Date: 2014-09-11 18:09:26
Message-ID: CAMkU=1z3j2DAzkQ4tSDYNkZR0HX1Z8Z9_u_fMpJ9tg3myq5gCw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Thu, Sep 11, 2014 at 7:09 AM, Matheus de Oliveira <
matioli(dot)matheus(at)gmail(dot)com> wrote:

>
> On Wed, Sep 10, 2014 at 10:05 PM, Huang, Suya <Suya(dot)Huang(at)au(dot)experian(dot)com>
> wrote:
>
>> --plan 1, 10 seconds were spent on sequential scan on term_weekly table.
>>
>>
>>
>> dev=# explain analyze select distinct cs_id from lookup_weekly n inner
>> join term_weekly s on s.b_id=n.b_id and s.date=n.date where term in
>> ('cat'::text);
>>
>>
>>
>>
>>
>>
>> QUERY PLAN
>>
>>
>> -----------------------------------------------------------------------------------------------------------------------------------------------------------------
>>
>> HashAggregate (cost=2100211.06..2100211.11 rows=5 width=4) (actual
>> time=27095.470..27095.487 rows=138 loops=1)
>> ...
>>
>>
>>
>> --plan 2, only 1 second spent on index scan of term_weekly table,
>> however, as it selects the big table to do the hashing, it takes 22 seconds
>> for the hash to complete. The advantage get from index has been totally
>> lost because of this join order.
>>
>>
>>
>>
>> QUERY PLAN
>>
>>
>> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>
>> HashAggregate (cost=1429795.17..1429795.22 rows=5 width=4) (actual
>> time=22991.289..22991.307 rows=138 loops=1)
>> ...
>
>
> Am I reading something wrong here? I haven't looked all the plan, but the
> second is faster (overall), so why do you think you need a hint or change
> what the planner choose? For me looks like using the index is the best for
> this situation. Could you try running this multiple times and taking the
> min/max/avg time of both?
>

The difference in time could be a caching effect, not a reproducible
difference.

The 2nd plan uses 3GB of memory, and there might be better uses for that
memory.

Currently memory is un-costed, other than "cliff costing" once you thinks
it will exceed work_mem, which I think is a problem. Just because I will
let you use 4GB of memory if you will really benefit from it, doesn't mean
you should use 4GB gratuitously.

Suya, what happens if you lower work_mem setting? Does it revert to the
plan you want?

Cheers,

Jeff

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Huang, Suya 2014-09-12 02:26:04 weird execution plan
Previous Message Matheus de Oliveira 2014-09-11 14:09:52 Re: how to change the provoke table in hash join