Re: hash join vs nested loop join

From: "Kevin Grittner" <kgrittn(at)mail(dot)com>
To: "Huan Ruan" <huan(dot)ruan(dot)it(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: hash join vs nested loop join
Date: 2012-12-20 14:06:29
Message-ID: 20121220140629.14720@gmx.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Huan Ruan wrote:
> Kevin Grittner wrote:

>> Frankly, at 12 microseconds per matched pair of rows, I think
>> you're doing OK.
>
> This plan is the good one, I want the indexscan nested loop join and this
> is only achieved after making all these costing factors change. Before
> that, it was hash join and was very slow.
>
> However, I'm worried about the config changes being too 'extreme', i.e.
> both sequential I/O and random I/O have the same cost and being only 0.1.
> So, I was more wondering why I have to make such dramatic changes to
> convince the optimiser to use NL join instead of hash join. And also, I'm
> not sure what impact will these changes have on other queries yet. e.g.
> will a query that's fine with hash join now choose NL join and runs slower?

I understand the concern, but PostgreSQL doesn't yet have a knob to
turn for "cache hit ratio". You essentially need to build that into
the page costs. Since your cache hit ratio (between shared buffers
and the OS) is so high, the cost of page access relative to CPU
costs has declined and there isn't any effective difference between
sequential and random access. As the level of caching changes, you
may need to adjust. In one production environment where there was
significant caching, but far enough from 100% to matter, we tested
various configurations and found the fastest plans being chosen
with seq_page_cost = 0.3 and random_page_cost = 0.5. Tune to your
workload.

-Kevin

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2012-12-20 15:43:26 Re: Why does the query planner use two full indexes, when a dedicated partial index exists?
Previous Message Richard Neill 2012-12-20 05:57:14 Re: Why does the query planner use two full indexes, when a dedicated partial index exists?