From: | Gavin Sherry <swm(at)linuxworld(dot)com(dot)au> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Design notes for EquivalenceClasses |
Date: | 2007-01-18 00:13:20 |
Message-ID: | Pine.LNX.4.58.0701181052040.25798@linuxworld.com.au |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, 17 Jan 2007, Tom Lane wrote:
> strict. However, we also allow equivalence clauses that appear below the
> nullable side of an outer join to form EquivalenceClasses; for these
> classes, the interpretation is that either all the values are equal, or
> all (except pseudo-constants) have gone to null. (This requires a
> limitation that non-constant members be strict, else they might not go
> to null when the other members do.) Consider for example
>
> SELECT *
> FROM a LEFT JOIN
> (SELECT * FROM b JOIN c ON b.y = c.z WHERE b.y = 10) ss
> ON a.x = ss.y
> WHERE a.x = 42;
>
> We can form the below-outer-join EquivalenceClass {b.y c.z 10} and thereby
> apply c.z = 10 while scanning c. (The reason we disallow outerjoin-delayed
> clauses from forming EquivalenceClasses is exactly that we want to be able
> to push any derived clauses as far down as possible.) But once above the
> outer join it's no longer necessarily the case that b.y = 10, and thus we
> cannot use such EquivalenceClasses to conclude that sorting is unnecessary
> (see discussion of PathKeys below). In this example, notice also that
> a.x = ss.y (really a.x = b.y) is not an equivalence clause because its
> applicability to b is restricted by the outer join; thus we do not make
> the mistake of concluding b.y = 42, even though we do have an equivalence
> class for {a.x 42}.
This seems to be correct. A nice improvement.
> With a little further thought, it becomes apparent that nestloop joins
> can also produce sorted output. For example, if we do a nestloop join
> between outer relation A and inner relation B, then any pathkeys relevant
> to A are still valid for the join result: we have not altered the order of
> the tuples from A. Even more interesting, if there was an equivalence clause
> A.X=B.Y, and A.X was a pathkey for the outer relation A, then we can assert
> that B.Y is a pathkey for the join result; X was ordered before and still
> is, and the joined values of Y are equal to the joined values of X, so Y
> must now be ordered too. This is true even though we used neither an
> explicit sort nor a mergejoin on Y. (Note: hash joins cannot be counted
> on to preserve the order of their outer relation, because the executor
> might decide to "batch" the join, so we always set pathkeys to NIL for
> a hashjoin path.) Exception: a RIGHT or FULL join doesn't preserve the
> ordering of its outer relation, because it might insert nulls at random
> points in the ordering.
I was thinking about this, but in relation to hash joins. A hash join
cannot be guaranteed to produce output sorted according to the pathkey of
the outer relation (as explained in the existing README). I wonder,
however, if it might be useful for hash join to pass a hint that the
output is known ordered (i.e., the join was not split into multiple
batches).
Thanks,
Gavin
From | Date | Subject | |
---|---|---|---|
Next Message | Jeff Davis | 2007-01-18 00:37:53 | Re: Function execution costs 'n all that |
Previous Message | Alvaro Herrera | 2007-01-17 22:41:19 | Re: [COMMITTERS] pgsql: Implement width_bucket() for the |