From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | pgsql-hackers(at)postgreSQL(dot)org |
Subject: | Fixing the representation of ORDER BY/GROUP BY/DISTINCT |
Date: | 2008-08-01 00:54:49 |
Message-ID: | 25811.1217552089@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
So while I was fooling with Steve Midgley's problem I got a bit of a bee
in my bonnet about the way that the parser emits ORDER BY, GROUP BY,
and DISTINCT lists.
* Currently, ORDER BY and DISTINCT use lists of SortClause, while GROUP
BY is a list of GroupClause --- but these are actually the same struct,
and there's a fair amount of code that relies on that. The advantage of
them being the same is that it's easy to compare them when considering
sort-and-uniq-style grouping plans. Except it's not easy enough: I
tried to use a list_member test in one place, and of course it didn't
work because equal() never sees distinct node tags as equal. So I'm
thinking we ought to unify the two node types logically as well as
physically, and just have one node type (SortGroupClause, maybe).
* The current representation is fine for ORDER BY but leaves something
to be desired for GROUP BY and DISTINCT: there isn't anyplace to specify
the equality operator for a hash-based grouping operation. This results
in repeat lookups in the planner to fetch information that was readily
available at parse time. But what's worse IMHO is that we simply cannot
represent a grouping query for a datatype that hasn't got a btree sort
opclass --- even though we could implement it, by means of hashing, if
the type has a hashable equality operator. (This isn't academic: it's
easy to imagine datatypes that have equality but no natural linear sort
order. A practical example is XID, which in fact has a hash opclass
but not a btree opclass, because it violates the law of transitivity.)
So what I'm thinking we want is something like
typedef struct SortGroupClause
{
NodeTag type;
Index tleSortGroupRef; /* reference into targetlist */
Oid eqop; /* the equality operator ('=' op) */
Oid sortop; /* the ordering operator ('<' op), or 0 */
bool nulls_first; /* do NULLs come before normal values? */
} SortGroupClause;
In an ORDER BY item the sortop and nulls_first flag *must* be supplied.
The eqop isn't really useful for ORDER BY, but it's trivial to get when
we get the sortop, and always including it simplifies comparisons to
GROUP/DISTINCT items. In GROUP BY/DISTINCT items, the eqop *must* be
supplied, and if it is a btree equality operator then the associated
sortop should be given as well. We'd continue the current practice of
duplicating the ordering properties of any ORDER BY clause given for the
same targetlist item, so that compatible ordering and grouping items are
just equal().
* Another thing I've gotten tired of is the difficulty of telling
DISTINCT from DISTINCT ON in the parsed representation. Surely we can
afford to stick another bool field into Query to make that distinction
unambiguous. This is important for making the world safe for hashed
DISTINCT, since AFAICS we probably can't ever use hashing for DISTINCT
ON --- its definition is too dependent on the assumption of sorting.
Barring objections, I'm going to go off and make this happen. It won't
immediately result in supporting hashed DISTINCT, but it's another
necessary step on the road to that.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | ITAGAKI Takahiro | 2008-08-01 02:53:25 | Re: Copy storage parameters on CREATE TABLE LIKE/INHERITS |
Previous Message | Alvaro Herrera | 2008-07-31 22:19:29 | Re: Review: DTrace probes (merged version) ver_03 |