Re: [HACKERS] Oops, I seem to have changed UNION's behavior

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <maillist(at)candle(dot)pha(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: [HACKERS] Oops, I seem to have changed UNION's behavior
Date: 1999-07-07 13:46:14
Message-ID: 25787.931355174@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Bruce Momjian <maillist(at)candle(dot)pha(dot)pa(dot)us> writes:
> Can someone comment on this? Is it still an issue with cnf'ify removing
> duplicate cases?

This is still an open bug. I looked at it and concluded that a clean
solution probably requires modifying the parsetree structure for UNIONs.
(Currently, UNION, INTERSECT, EXCEPT are converted to OR, AND, AND NOT
structures with queries as arguments ... which would be OK, except that
the semantics aren't really the same, and the difference between UNION
and UNION ALL doesn't fit into that operator set at all.)

That looked like a lot of work for what is surely a low-priority bug,
so I haven't gotten around to it yet. Need more round tuits.

In the meantime, it should be on the TODO list:
* SELECT foo UNION SELECT foo is incorrectly simplified to SELECT foo

regards, tom lane

>> The equal() updates I installed yesterday (to fix the "don't know
>> whether nodes of type 600 are equal" problem) have had an unintended
>> side effect.
>>
>> Am I right in thinking that UNION (without ALL) is defined to do a
>> DISTINCT on its result, so that duplicates are removed even if the
>> duplicates both came from the same source table? That's what 6.4.2
>> does, but I do not know if it's strictly kosher according to the SQL
>> spec.
>>
>> If so, the code is now busted, because with the equal() extension in
>> place, cnfify() is able to recognize and remove duplicate select
>> clauses. That is, "SELECT xxx UNION SELECT xxx" will be folded to
>> just "SELECT xxx" ... and that doesn't mean the same thing.
>>
>> An actual example: given the data
>>
>> play=> select a from tt;
>> a
>> -
>> 1
>> 1
>> 2
>> 3
>> (4 rows)
>>
>> Under 6.4.2 I get:
>>
>> play=> select a from tt union select a from tt;
>> a
>> -
>> 1
>> 2
>> 3
>> (3 rows)
>>
>> Note lack of duplicate "1". Under current sources I get:
>>
>> ttest=> select a from tt union select a from tt;
>> a
>> -
>> 1
>> 1
>> 2
>> 3
>> (4 rows)
>>
>> since the query is effectively reduced to just "select a from tt".
>>
>> Assuming that 6.4.2 is doing the Right Thing, I see two possible fixes:
>> (1) simplify equal() to say that two T_Query nodes are never equal, or
>> (2) modify the planner so that the "select distinct" operation is
>> inserted explicitly, and will thus happen even if the UNIONed selects
>> are collapsed into just one.
>>
>> (1) is a trivial fix of course, but it worries me --- maybe someday
>> we will need equal() to give an honest answer for Query nodes.
>> But I don't have the expertise to apply (2), and it seems like rather
>> a lot of work for a boundary case that isn't really interesting in
>> practice.
>>
>> Comments? *Is* 6.4.2 behaving according to the SQL spec?
>>
>> regards, tom lane
>>
>>

> --
> Bruce Momjian | http://www.op.net/~candle
> maillist(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
> + If your life is a hard drive, | 830 Blythe Avenue
> + Christ can be your backup. | Drexel Hill, Pennsylvania 19026

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 1999-07-07 13:54:11 Re: [HACKERS] Some progress on INSERT/SELECT/GROUP BY bugs
Previous Message Tom Lane 1999-07-07 13:29:28 Re: [HACKERS] Postgres Speed or lack thereof