From: | Simon Riggs <simon(at)2ndquadrant(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: IN vs EXISTS equivalence |
Date: | 2008-08-14 17:50:09 |
Message-ID: | 1218736209.5343.537.camel@ebony.2ndQuadrant |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, 2008-08-08 at 16:23 -0400, Tom Lane wrote:
> NOT IN is a lot trickier,
> for the same reason that typically trips up novices who try to use it:
> if any row of the subselect produces a NULL comparison result, then it
> is impossible for the NOT IN to result in TRUE, which means that it
> does not function as a standard antijoin. I thought about optimizing
> it only in the case where we can prove that the subselect outputs and
> the compared-to values are known NOT NULL (which in typical cases we
> could prove by looking for NOT NULL constraints on those table
> columns). The trouble with this is that that's not a sufficient
> condition: you must also assume that the comparison operator involved
> never yields NULL for non-null inputs. That might be okay for btree
> comparison functions but it's not a very comfy assumption in general;
> we certainly haven't got any explicit knowledge that any functions are
> guaranteed to act that way. So this case might be worth doing later
> but I'm not feeling excited about it. We generally tell people to
> avoid NOT IN and I'm happy to keep on saying that.
Just found this comment, after reading what you said on other thread
about NOT IN.
NOT IN is a serious performance issue for most people. We simply can't
say to people "you were told not to".
If we can fix it easily for the majority of cases, we should. We can't
let the "it won't work in certain cases" reason prevent various
optimizations from going in. There are tons of places where we say "XXX
needs later improvement" in code comments. So lets do that here also. It
certainly wouldn't be the first optimization/feature that went into code
in a restricted way that didn't work for all cases: hash joins, ANALYZE,
partial indexes etc..
Anybody that is writing complex SQL with user defined operators knows
enough to re-write their queries correctly, so there will be almost no
negative effect from making the NOT IN optimisation a special case. And
if there is an effect, the people effected can fix the problem.
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support
From | Date | Subject | |
---|---|---|---|
Next Message | Alvaro Herrera | 2008-08-14 18:07:03 | Re: autovacuum and TOAST tables |
Previous Message | Kevin Grittner | 2008-08-14 17:45:35 | Re: WIP: patch to create explicit support for semi and anti joins |