Re: BUG #13908: Query returns too few rows

From: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To: Seth P <seth-p(at)outlook(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #13908: Query returns too few rows
Date: 2016-02-04 23:04:54
Message-ID: CAKFQuwYTkgnYEAc5woEnhsOJPBqD50JU_wDn4iQPwR2x9Se85A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Thu, Feb 4, 2016 at 2:56 PM, Seth P <seth-p(at)outlook(dot)com> wrote:

> Meanwhile, your results are curious not just because they differ from
> mine, but because you observe the DISTINCT removing *any* duplicates.
> There shouldn't be any, since rrr.idx is a unique serial (idx integer NOT
> NULL DEFAULT nextval('rrr_idx_seq'::regclass)). In particular, the
> following query returns no rows:
>
> SELECT rrr.idx FROM public.rrr GROUP BY rrr.idx HAVING count(*)>1
>
> Put another way, in your database, can you find the duplicates being
> removed by DISTINCT
>
B
​ut the query is written as a series of INNER JOINs against "rrr" and so
its output is not constrained by UNIQUE(rrr.idx). So it very well may
return multiple rows for a given rrr.idx but when you restrict the output
to "rrr.*" and add DISTINCT the result is fewer records since now you only
have one of each rrr.idx.



​I made one possibly meaningful change when I reported my discrepancy
earlier. The query with the ​416,075 DISTINCT row count uses "SELECT
DISTINCT rrr" while the one returning 415,874 uses "SELECT DISTINCT rrr.*".

​That said, running the following returns zero records:

​SELECT
rrr.idx, count(*)
FROM public.rrr
INNER JOIN public.fff
ON fff.idx = rrr.fff_idx
INNER JOIN public.uuu
ON uuu.universe_hash = 5188205190738336870 AND
uuu.bababa = rrr.bababa
WHERE
fff.file_name_model_ver = '100' AND
fff.file_name_date BETWEEN '2005-03-01' AND '2005-07-30'
GROUP BY rrr.idx
HAVING count(*) > 1

​So indeed the query does not result in multiple rrr.idx values being
output for the DISTINCT to remove.

​David J.​

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2016-02-04 23:28:32 Re: BUG #13908: Query returns too few rows
Previous Message tarasbob 2016-02-04 22:13:23 BUG #13918: Simple query with Having clause returns incorrect results