From: | Jesper Krogh <jesper(at)krogh(dot)cc> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Martijn van Oosterhout <kleptog(at)svana(dot)org>, David Fetter <david(at)fetter(dot)org>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Evaluation of secondary sort key. |
Date: | 2011-04-18 05:25:47 |
Message-ID: | 4DABCB5B.4080606@krogh.cc |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2011-04-09 18:54, Tom Lane wrote:
> I think that would be a positive disimprovement. The current design
> guarantees that volatile sort expressions are evaluated exactly once,
> in the order the rows are read from the data source. There would be no
> guarantees at all, either as to the number of evaluations or the order
> in which they happen, if we tried to do evaluation only during the
> actual sort.
If I could "only evaluate" the rows needed, then that would also
be a huge win, below case shifts what "definitely shouldn't be a seqscan"
into one due to a secondary sort key.
> Another small problem is that any such thing would require carrying
> along some kind of closure (ie, the expression and all its input
> values), not just the final sort key value, in tuples being sorted.
> The ensuing complexity, speed penalty, and increase in data volume
> to be sorted would be paid by everybody, making this probably a net
> performance loss when considered across all applications.
Getting the value for the first sortkey and carrying on a closure
for the rest would mostly (very often) be "optimal" ?
It would also enable a select that has to sortkeys to utilize an
index that only contains the primary sortkey, which is a huge
negative effect of what's being done today.
2011-04-18 07:12:43.931 testdb=# explain select id from testdb.testtable
order by id limit 500;
QUERY PLAN
----------------------------------------------------------------------------------------------------
Limit (cost=0.00..262.75 rows=500 width=4)
-> Index Scan using testtable_pkey on testtable
(cost=0.00..15015567.84 rows=28573400 width=4)
(2 rows)
Time: 1.363 ms
2011-04-18 07:12:52.498 testdb=# explain select id from testdb.testtable
order by id,name limit 500;
QUERY PLAN
-----------------------------------------------------------------------------------
Limit (cost=5165456.70..5165457.95 rows=500 width=35)
-> Sort (cost=5165456.70..5236890.20 rows=28573400 width=35)
Sort Key: id, name
-> Seq Scan on testtable (cost=0.00..3741675.00
rows=28573400 width=35)
(4 rows)
Time: 1.420 ms
Enabling any users to sort using multiple keys, without ending in Seq
Scans somewhere down
the line seems to require multi dimension indexes on all combinations of
colums
--
Jesper
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2011-04-18 05:43:50 | Re: Formatting Curmudgeons WAS: MMAP Buffers |
Previous Message | Greg Smith | 2011-04-18 04:48:06 | Re: Formatting Curmudgeons WAS: MMAP Buffers |