Quick Links

Re: select distinct in a subquery bug/problem

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	"Dan Halbert" <halbert(at)halwitz(dot)org>
Cc:	pgsql-general(at)postgresql(dot)org
Subject:	Re: select distinct in a subquery bug/problem
Date:	2012-08-11 05:52:38
Message-ID:	4101.1344664358@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

"Dan Halbert" <halbert(at)halwitz(dot)org> writes:
> 1. select count(t1_id) from t1 where t1_id not in (select distinct t1_id from t2 limit 1103) ==> 13357 [CORRECT result]
> 2. select count(t1_id) from t1 where t1_id not in (select distinct t1_id from t2 limit 1104) ==> 0 [WRONG result; should be close to 13357]

I'm betting the 1104'th row in t2 is a null. NOT IN generally doesn't
do what people expect when there are nulls in the sub-select ... but
it is acting per spec.

The apparent dependency on which plan is chosen is illusory and stems
from the fact that "select distinct ... limit" gives you an unspecified
set of rows. I think the indexscan/unique plan would produce nulls
last, so you'd not see them as long as the limit was less than the
number of distinct values ... but if the distinct is done via hash
aggregation, the null entry could come out much earlier.

regards, tom lane

In response to

select distinct in a subquery bug/problem at 2012-08-11 03:32:35 from Dan Halbert

Browse pgsql-general by date

	From	Date	Subject
Next Message	Laszlo Fogas	2012-08-11 07:33:29	Re: slowness what only full vacuum can solve
Previous Message	Dan Halbert	2012-08-11 03:32:35	select distinct in a subquery bug/problem