Re: select random order by random

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Gregory Stark <stark(at)enterprisedb(dot)com>
Cc: "Scott Marlowe" <scott(dot)marlowe(at)gmail(dot)com>, "Martijn van Oosterhout" <kleptog(at)svana(dot)org>, "Richard Huxton" <dev(at)archonet(dot)com>, "Lee Keel" <lee(dot)keel(at)uai(dot)com>, "piotr_sobolewski" <piotr_sobolewski(at)o2(dot)pl>, pgsql-general(at)postgresql(dot)org
Subject: Re: select random order by random
Date: 2007-11-01 22:57:45
Message-ID: 16867.1193957865@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Gregory Stark <stark(at)enterprisedb(dot)com> writes:
> So I guess having the parser do this substitution kind of makes sense
> if you're thinking about things the way the spec does. It doesn't make
> much sense if you're thinking the way Postgres does of having
> arbitrary expressions there independent of what's in the select list.

Again: this is not "Postgres vs the spec", it is "SQL92 vs SQL99".
I draw your attention to the relevant text...

SQL92:

<order by clause> ::=
ORDER BY <sort specification list>

<sort specification list> ::=
<sort specification> [ { <comma> <sort specification> }... ]

<sort specification> ::=
<sort key> [ <collate clause > ] [ <ordering specification> ]

<sort key> ::=
<column name>
| <unsigned integer>

<ordering specification> ::= ASC | DESC

...

10)If ORDER BY is specified, then each <sort specification> in the
<order by clause> shall identify a column of T.

Case:

a) If a <sort specification> contains a <column name>, then T
shall contain exactly one column with that <column name> and
the <sort specification> identifies that column.

b) If a <sort specification> contains an <unsigned integer>,
then the <unsigned integer> shall be greater than 0 and not
greater than the degree of T. The <sort specification> iden-
tifies the column of T with the ordinal position specified by
the <unsigned integer>.

(T is the table emitted by the SELECT.)

SQL99:

<order by clause> ::=
ORDER BY <sort specification list>

<sort specification list> ::=
<sort specification> [ { <comma> <sort specification> }... ]

<sort specification> ::=
<sort key> [ <collate clause> ] [ <ordering specification> ]

<sort key> ::=
<value expression>

<ordering specification> ::= ASC | DESC

18) If an <order by clause> is specified, then:

a) Let K(i) be the <sort key> contained in the i-th <sort
specification>.

b) Let DT be the declared type of K(i).

c) If DT is a user-defined type, then the comparison form of DT
shall be FULL.

d) K(i) shall not be a <literal>.

e) If QE is a <query expression body> that is a <non-join query
expression> that is a <non-join query term> that is a <non-
join query primary> that is a <simple table> that is a <query
specification>, then the <cursor specification> is said to be
a simple table query.

f) Case:

i) If <sort specification list> contains any <sort key> K(i)
that contains a column reference to a column that is not a
column of T, then:

1) The <cursor specification> shall be a simple table
query.

2) Case:

A) If K(i) is not equivalent to a <value expression>
immediately contained in any <derived column> in the
<select list> SL of <query specification> QS contained
in QE, then:

I) T shall not be a grouped table.

II) QS shall not specify the <set quantifier> DISTINCT
or directly contain one or more <set function
specification>s.

III) Let C(j) be a column that is not a column of T and
whose column reference is contained in some K(i).

IV) Let SKL be the list of <derived column>s that are
<column name>s of column references to every C(j).
The columns C(j) are said to be extended sort key
columns.

V) Let TE be the <table expression> immediately
contained in QS.

VI) Let ST be the result of evaluating the <query
specification>:

SELECT SL, SKL FROM TE

B) Otherwise:

I) Let ST be T.

II) For every <derived column> DC(e) of SL that is
equivalent to K(i), if DC(e) has a <column name>,
then let CN(e) be that <column name>; otherwise:

1) Let CN(e) be an implementation-defined <column
name> that is not equal to any <column name> of
any column of ST.

2) DC(e) is effectively replaced by DE(e) AS CN(e)
in the <select list> of ST, where DE(e) is the
<derived element> of DC(e).

III) K(i) is effectively replaced by CN(e).

ii) Otherwise, let ST be T.

g) ST is said to be a sort table.

h) K(i) is a <value expression>. The <value expression> shall
not contain a <subquery> or a <set function specification>,
but shall contain a <column reference>.

i) Let X be any <column reference> directly contained in K(i).

ii) If X does not contain an explicit <table or query name> or
<correlation name>, then K(i) shall be a <column name> that
shall be equivalent to the name of exactly one column of
ST.

NOTE 287 - A previous version of ISO/IEC 9075 allows <sort
specification> to be a <signed integer> to denote a column
reference of a column of T. That facility no longer exists.
See Annex E, "Incompatibilities with ISO/IEC 9075:1992 and
ISO/IEC 9075-4:1996".

In the usual tradition of SQL99, the spec text is enormously less
readable than SQL92 was, but I *think* this says nearly the same thing
as what we do: a plain column reference in ORDER BY is first sought as
an output column name, and failing that sought as a column name of one
of the input tables. They are more restrictive than we are but that's
OK.

For the particular issue at hand here, it seems to me that 18.f.i.2.B
dictates that a <sort key> matching an output column be treated as a
reference to the column, not as an independently evaluated expression.
Admittedly they are not talking about volatile functions per se, but
I think there's some defense here for the way our parser does it.

regards, tom lane

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message brian 2007-11-01 23:02:17 Re: Number to Words Conversion
Previous Message Gregory Stark 2007-11-01 22:11:24 Re: select random order by random