From: | Kenneth Marshall <ktm(at)rice(dot)edu> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: a few crazy ideas about hash joins |
Date: | 2009-04-03 13:09:11 |
Message-ID: | 20090403130911.GL29341@it.is.rice.edu |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Apr 03, 2009 at 08:03:33AM -0400, Robert Haas wrote:
> On Fri, Apr 3, 2009 at 1:48 AM, Heikki Linnakangas
> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> > Robert Haas wrote:
> >>
> >> While investigating some performance problems recently I've had cause
> >> to think about the way PostgreSQL uses hash joins. ?So here are a few
> >> thoughts. ?Some of these have been brought up before.
> >>
> >> 1. When the hash is not expected to spill to disk, it preserves the
> >> pathkeys of the outer side of the join. ?If the optimizer were allowed
> >> to assume that, it could produce significantly more efficient query
> >> plans in some cases. ?The problem is what to do if we start executing
> >> the query and find out that we have more stuff to hash than we expect,
> >> such that we need multiple batches? ?Now the results won't be sorted.
> >> I think we could handle this as follows: Don't count on the hash join
> >> to preserve pathkeys unless it helps, and only rely on it when it
> >> seems as if the hash table will still fit even if it turns out to be,
> >> say, three times as big as expected. ?But if you are counting on the
> >> hash join to preserve pathkeys, then pass that information to the
> >> executor. ?When the executor is asked to perform a hash join, it will
> >> first hash the inner side of the relation. ?At that point, we know
> >> whether we've succesfully gotten everything into a single batch, or
> >> not. ?If we have, perform the join normally. ?If the worst has
> >> happened and we've gone multi-batch, then perform the join and sort
> >> the output before returning it. ?The performance will suck, but at
> >> least you'll get the right answer.
> >>
> >> Previous in-passing reference to this idea here:
> >> http://archives.postgresql.org/pgsql-hackers/2008-09/msg00806.php
> >
> > Hmm, instead of a sorting the output if the worst happens, a final merge
> > step as in a merge sort would be enough.
>
> Yeah - good point.
>
> >> 2. Consider building the hash table lazily. ?I often see query planner
> >> pick a hash join over a nested inner indexscan because it thinks that
> >> it'll save enough time making hash probes rather than index probes to
> >> justify the time spent building the hash table up front. ?But
> >> sometimes the relation that's being hashed has a couple thousand rows,
> >> only a tiny fraction of which will ever be retrieved from the hash
> >> table. ?We can predict when this is going to happen because n_distinct
> >> for the outer column will be much less than the size of the inner rel.
> >> ?In that case, we could consider starting with an empty hash table
> >> that effectively acts as a cache. ?Each time a value is probed, we
> >> look it up in the hash table. ?If there's no entry, we use an index
> >> scan to find the matching rows and insert them into the hash table.
> >> Negative results must also be cached.
> >
> > Yeah, that would be quite nice. One problem is that our ndistinct estimates
> > are not very accurate.
>
> Well, the right solution to that problem is to fix our ndistinct estimates. :-)
Also, as seen in the hash index build performance patch, it would be better
to set the initial hash size bigger than needed to avoid the inter-page
shuffle if the guess is wrong.
Ken
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2009-04-03 13:25:17 | Re: Documentation Update: Document pg_start_backup checkpoint behavior |
Previous Message | Teodor Sigaev | 2009-04-03 13:08:33 | Re: Crash in gist insertion on pathological box data |