Re: Understanding Hash Join performance

From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "srobertjames" <srobertjames(at)gmail(dot)com>, "pgsql-performance" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Understanding Hash Join performance
Date: 2011-06-02 14:57:25
Message-ID: 4DE75E85020000250003DFEB@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Robert James <srobertjames(at)gmail(dot)com> wrote:

> A query I has spends a long time on Hash Joins (and Hash Left
> Joins).

To submit a post which gives us enough information to help you speed
up that query, please read this page:

http://wiki.postgresql.org/wiki/SlowQueryQuestions

> I have a few questions:
>
> 1. When does Postgres decide to do a Hash Join, over another type
> of Join?
> 2. Do Hash Joins normally perform poorly? What can I do to speed
> them up?
> 3. What can I do to enable Postgres to use a faster type of join?

Questions this general can only be answered in a general way, so
here goes.

The planner doesn't choose a particular plan type, exactly -- it
generates a lot of alternative plans,, basically looking at all the
ways it knows how to retrieve the requested set of data, and
estimates a cost for each plan based on available resources and
adjustable costing factors. It will choose the plan with the lowest
estimated cost. There are many situations where a hash join is
faster than the alternatives. If it's using one where another
alternative is actually faster, it's not a matter of "enabling a
faster join type" -- it's a matter of setting your cost factors to
accurately reflect the real costs on your system.

You can generally make hash joins faster by increasing work_mem, but
that tends to cause data to be pushed from cache sooner and can run
you out of memory entirely, so it must be tuned carefully. And the
planner does take the size of work_mem and the expected data set
into consideration when estimating the cost of the hash join.

-Kevin

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Shaun Thomas 2011-06-02 15:34:06 Re: Problem query
Previous Message Kevin Grittner 2011-06-02 13:47:02 Re: Problem query