Re: Synopsis of SELECT statement: UNION, INTERSECTION, EXCEPT

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Euler Taveira <euler(at)timbira(dot)com(dot)br>
Cc: dirk(dot)lattermann(at)leanix(dot)net, pgsql-docs(at)lists(dot)postgresql(dot)org
Subject: Re: Synopsis of SELECT statement: UNION, INTERSECTION, EXCEPT
Date: 2018-03-16 05:17:00
Message-ID: 28115.1521177420@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-docs

Euler Taveira <euler(at)timbira(dot)com(dot)br> writes:
> 2018-03-15 7:18 GMT-03:00 PG Doc comments form <noreply(at)postgresql(dot)org>:
>> The SYNOPSIS section of the "SELECT" SQL command contains the line
>> [ { UNION | INTERSECT | EXCEPT } [ ALL | DISTINCT ] select ]
>> (with a boldface "select"), but it is not clear what is meant by that
>> "select".

> It is a bug in the synopsis. UNION et al cannot contain some elements
> (such as ORDER BY) that is allowed for a sub-select. The attached
> patch replace "select" with the correct element ("select_statement").

Well, "select_statement" isn't defined in the synopsis either.
We do define it implicitly in the UNION/INTERSECT/EXCEPT subsections,
but is that enough?

I think the parenthetical remark in the "UNION Clause" section is a bit
shaky too:

(ORDER BY and LIMIT can be attached to a subexpression if it is
enclosed in parentheses. Without parentheses, these clauses will be
taken to apply to the result of the UNION, not to its right-hand input
expression.)

This would seem to imply that

SELECT * FROM foo ORDER BY x UNION SELECT * FROM bar

is legal and means the same as

(SELECT * FROM foo UNION SELECT * FROM bar) ORDER BY x

which is wrong.

Also, whether or not that wording is right, it's not duplicated in the
INTERSECT or EXCEPT headings, where logically it ought to appear too.
We could duplicate it there maybe, but I'm starting to feel like this is
just doubling down on a bad documentation design.

The long and short of it is that UNION et al were wedged into the initial
synopsis in a way that's at best misleading and at worst a lie. But
I'm not sure how to make that better without also making it a lot more
confusing. A pedantically correct syntax synopsis would look something
like

[ WITH [ RECURSIVE ] with_query [, ...] ]
select_stmt [ { UNION | INTERSECT | EXCEPT } [ ALL | DISTINCT ] select_stmt ...]

where select_stmt is

SELECT [ ALL | DISTINCT [ ON ( expression [, ...] ) ] ]
[ * | expression [ [ AS ] output_name ] [, ...] ]
[ FROM from_item [, ...] ]
[ WHERE condition ]
[ GROUP BY grouping_element [, ...] ]
[ HAVING condition [, ...] ]
[ WINDOW window_name AS ( window_definition ) [, ...] ]
[ ORDER BY expression [ ASC | DESC | USING operator ] [ NULLS { FIRST | LAST } ] [, ...] ]
[ LIMIT { count | ALL } ]
[ OFFSET start [ ROW | ROWS ] ]
[ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ONLY ]
[ FOR { UPDATE | NO KEY UPDATE | SHARE | KEY SHARE } [ OF table_name [, ...] ] [ NOWAIT | SKIP LOCKED ] [...] ]

and that's still not right because ORDER BY et al can't be attached to a
select_stmt that's the argument of a set operation, so really we'd need
a couple of levels of nonterminals before we get down to the basic
"SELECT expression FROM ..." part. Nor has the use of parentheses been
mentioned yet.

If you look at either the SQL standard's syntax diagrams or our actual
bison grammar, they're both unreasonably complicated. Novices would
not thank us for reproducing that in full detail in the basic SELECT
syntax summary, and I'm not sure experts would either.

So, surely there's room for improvement here, but I'm not certain what
it'd look like. Maybe we should split out the discussion of set-operation
syntax altogether? How exactly?

regards, tom lane

In response to

Responses

Browse pgsql-docs by date

  From Date Subject
Next Message PG Doc comments form 2018-03-16 11:55:56 prog
Previous Message Euler Taveira 2018-03-16 04:56:32 Re: "IS NOT DOCUMENT" is missing