Quick Links

Re: Okay to tighten definition of oprcanhash?

From:	Greg Stark <gsstark(at)mit(dot)edu>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Okay to tighten definition of oprcanhash?
Date:	2002-12-20 21:29:46
Message-ID:	87of7gbcqd.fsf@stark.dyndns.tv
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> I have been looking into the possibility of using a hashtable to speed
> up "x IN (SELECT y FROM ...)" operations.

That's certainly one of the join types that Oracle can perform, and it's
frequently by far the fastest.

I'm not sure but I think the way Oracle optimizes subselects is by
transforming them into the equivalent join. Or rather, probably by
transforming both joins and subselects into an equivalent internal
representation.

This applies equally to things like
WHERE x=(select..)
as well as things like
WHERE x IN (select...)

The former is exactly equivalent to a join with an assertion of uniqueness.
The latter is a little different but still equivalent to a join with special
behaviour in case of duplicates. In many cases the database will have a
constraint telling it that no duplicates will appear in which cases it should
be able to optimize out any extra work anyways.

I know when I guided less experienced SQL programmers I urged them not to
worry about which form would perform better, only which form more clearly
expressed the result set they wanted. I promised them the database should
produce the same query plan for equivalent queries regardless of the form
chosen to express that in.

That was almost always true for Oracle. I generally found it impossible to
optimize a query merely by changing it from one equivalent form to another.
Oracle nearly always produced exactly the same plan. I always had to either
find a non-equivalent form that I knew would produce the same results only
because of extra information I had about the data, or add optimizer hints.

So is there some more general internal representation that can represent all
three of these cases in a consistent manner? It seems more powerful to
implement hash joins in a way that helps normal join queries as well as
particular subselect forms of queries.

For what it's worth my limited experience so far with postgres is that what
you're talking about is sorely needed. I'm having trouble getting queries that
I would write without thinking twice about on Oracle to perform reasonably on
postgres even after trying all kinds of contortions. And they're precisely the
types of queries that would be helped by hash joins.

--
greg

In response to

Okay to tighten definition of oprcanhash? at 2002-12-20 20:28:45 from Tom Lane

Responses

Re: Okay to tighten definition of oprcanhash? at 2002-12-20 21:48:39 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2002-12-20 21:48:39	Re: Okay to tighten definition of oprcanhash?
Previous Message	Diego T.	2002-12-20 20:41:34