From: | Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> |
---|---|
To: | Pg Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | sorting table columns |
Date: | 2011-12-20 20:53:34 |
Message-ID: | 1324412114-sup-9608@alvh.no-ip.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
I've been trying to implement the holy grail of decoupling
logical/physical column sort order representation, i.e., the feature
that lets the server have one physical order, for storage compactness,
and a different "output" order that can be tweaked by the user. This
has been discussed many times; most recently, I believe, here:
http://archives.postgresql.org/pgsql-hackers/2007-02/msg01235.php
with implementation details here:
http://archives.postgresql.org/pgsql-hackers/2006-12/msg00983.php
The idea described there by Tom, and upon which I formed a vague
implementation plan in my head, is that I was to look for all uses of
an "attnum", and then replace it by either "attlognum" (i.e. the
user-visible sort identifier) or "attphysnum" (i.e. the order of
attributes as stored on disk). This turned out to be far from the
truth; the way things really work is that tupledescs are constructed
from catalogs, which are then converted into target lists, and those are
turned back into tupledescs in some places or into tupleslots in others.
So the real implementation is about making sure that we read the column
order ids, and preserve them appropriately while the query travels
through parser, rewriter, planner, executor, and to client. To this
end, I added members to nodes Var and TargetEntry; this lets me carry
the catalog data down. This isn't particularly complex, though it was
quite a challenge figuring out exactly the changes that made sense.
Soon thereafter I noticed that the column sort order needs to be
preserved in RangeTblEntry too, so I added a list of ints there to map
from logical to canonical.
So far so good. I made a few simple cases work: "select * from foo"
works correctly, of course, as do joins using ON, USING and NATURAL.
See attached patch (very much a WIP). (Note that for business reasons
there is no SQL syntax to fool around with logical column numbers; what
I do to test this is create a table and then UPDATE the
pg_attribute.attlognum entries to create a different order. Also I
haven't gotten into the business of handling a different physical
order.)
My next test case was a SQL function. There, things crashed and burned
immediately and it took me some time to realize that the reason for this
is the DestReceiver stuff: the patch I wrote to handle the basic cases
simply sorts attrs in logical order to pass to the receiveSlot function
(printtup in those basic cases), but this obviously affects how the
tuple is passed to other DestReceivers too. So the function DR is
getting the attributes in logical order, and then trying to stuff them
into a tuplestore as a minimaltuple. But the underlying code tries to
compute datum lengths using the TupleDesc and it doesn't use logical
order, but just canonical (catalog) order, which doesn't match the data
values. So it crashes.
So at this point I'm at a crossroads. One idea was to avoid sending
tuples in logical order unless the DR explicitely requests for it. So
the printtup DR would set a flag so that ExecutePlan would send tuples
in logical order; other DRs would not set this flag, and executor would
behave normally. What's not clear to me is that this is feasible at
all, because the order in which attrs are sent out are defined pretty
early in parser stages, so maybe we don't know enough about the DR yet.
Another idea was to modify the rest of the DRs so that they are aware
that the tuples they are being passed are in logical order.
Maybe this is all wrong and I need to take a completely different
approach. In particular, if I'm completely on the wrong track about
this, I want to know as soon as possible!
Ideas? Opinions?
--
Álvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Attachment | Content-Type | Size |
---|---|---|
alter-column-order.patch | application/octet-stream | 47.8 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2011-12-20 21:08:04 | Re: deferrable triggers |
Previous Message | Christopher Browne | 2011-12-20 20:52:04 | Re: JSON for PG 9.2 |