Re: [SQL] Oddities with NULL and GROUP BY

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-sql(at)postgreSQL(dot)org
Subject: Re: [SQL] Oddities with NULL and GROUP BY
Date: 1999-07-10 20:03:17
Message-ID: 5126.931636997@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

I believe I have finally resolved this old bug from May:

secret <secret(at)kearneydev(dot)com> writes:
> GROUPing in PostgreSQL w/NULLs works just fine when there is only 1
> column, however when one throws 2 in, the 2nd one having NULLs it starts
> failing. Your example demonstrates the right answer for 1 group by
> column, try it with 2 and I expect 6.5beta1 will fail as 6.4.2 does.

Actually, I believe that the problem was seen when you sort/group by
multiple columns and there are nulls in the *earlier* columns. The bug
I just fixed in the sort logic was that it would stop comparing as soon
as it hit a null column. Thus (NULL,1) would sort as equal to (NULL,2)
whereas you'd obviously like it to sort as smaller.

The reason it affected GROUP BY is that the sort could produce results
like
(NULL,1)
(NULL,1)
(NULL,2)
(NULL,1)
Because of the comparison bug, the sorter thought these tuples were
all equal-keyed and so it didn't worry about what order they'd come
out in. But then the adjacent-duplicate-merging step would produce
(NULL,1) --- 2 tuples represented by this group
(NULL,2)
(NULL,1)
which is the wrong answer.

The fix is to continue comparing columns when both tuples have a null
in one column, rather than stopping and declaring them equal. This
is in current CVS sources and will be in 6.5.1.

The bug cannot be observed if you use test cases that only sort/group
on one column...

regards, tom lane

Browse pgsql-sql by date

  From Date Subject
Next Message Oleg Bartunov 1999-07-10 20:16:39 SELECT DISTINCT question
Previous Message George Young 1999-07-09 20:56:25 class as a table column