Re: 9.2.4 vs 9.3.0 query planning (sort merge join vs hash join)

From: Tim Kane <tim(dot)kane(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: 9.2.4 vs 9.3.0 query planning (sort merge join vs hash join)
Date: 2014-05-12 12:22:51
Message-ID: CF96728A.7CBC6%tim.kane@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance


Hmm. Interesting.

Thanks Tom, it does indeed look like the planner is evaluating and excluding
the hashed-join plan as having a higher cost. I can see this by setting
enable_mergejoin=0.
I think this may play against other aspects of the query (though only
marginally), so I can’t really compare the resulting cost metrics – but
they’re certainly close.

I’ve just now played a little more with work_mem. I had already tried
increasing work_mem to all kinds of obscene levels.

Oddly, it seems the solution here is in fact to *reduce* work_mem in order
to elicit the preferred hash-join based plan.
In fact, I needed to reduce it to as low as 64MB for this behaviour – which
seems counter-intuitive. These are not small queries, so I previously had it
pushed up to 6GB for these tasks.

For smaller datasets,I need to reduce work_mem further still in order to
obtain a hash-join plan – though the difference in execution time becomes
less of a problem at this size.

Tim

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Date: Monday, 12 May 2014 12:23
To: Tim Kane <tim(dot)kane(at)gmail(dot)com>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: [PERFORM] 9.2.4 vs 9.3.0 query planning (sort merge join vs
hash join)

Tim Kane <tim(dot)kane(at)gmail(dot)com> writes:
> Is there a propensity for 9.2.4 to prefer a sort-merge-join, in place of a
> hash join?

Not particularly; I don't think there's any actual difference in the cost
estimation equations between 9.2 and 9.3. The two plans you show are
close enough in estimated cost that the ordering of their costs might be
coming out differently just as a matter of random variation in statistics.

It'd be worth double-checking the work_mem setting on both systems,
though, as (IIRC) an undersized work_mem hurts the estimate for hashes
more than for sorts.

regards, tom lane

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Jeff Janes 2014-05-12 16:02:12 Re: 9.2.4 vs 9.3.0 query planning (sort merge join vs hash join)
Previous Message Tom Lane 2014-05-12 11:23:00 Re: 9.2.4 vs 9.3.0 query planning (sort merge join vs hash join)