From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Andreas Kostyrka <andreas(at)kostyrka(dot)org> |
Cc: | "Ambrus Wagner (IJ/ETH)" <ambrus(dot)wagner(at)ericsson(dot)com>, pgsql-performance(at)postgresql(dot)org |
Subject: | Re: Merging large volumes of data |
Date: | 2007-05-07 15:09:28 |
Message-ID: | 6357.1178550568@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Andreas Kostyrka <andreas(at)kostyrka(dot)org> writes:
> Ambrus Wagner (IJ/ETH) wrote:
>> Is there a way to prevent PostgreSQL from doing a full sort on the result set after the unions have been completed? Even if I write
>>
>> (select a,b,c,d,e from table1 order by a,b) union all
>> (select a,b,c,d,e from table2 order by a,b) union all
>> etc...
>> (select a,b,c,d,e from tablen order by a,b) order by a,b;
>>
>> PostgreSQL does not seem to realise (maybe it should not be able to do this trick anyway) that the last "order by" clause is merely a final merge step on the ordered data sets.
At least to a first-order approximation, teaching it this would be a
waste of time.
Assume for simplicity that each of your K sub-selects yields N tuples.
Then there are KN items altogether, so if we just sort the big data set
it takes O(KN*log(KN)) time ... which is the same as O(KN*(log K + log N)).
OTOH, if we sort each sub-select by itself, that takes O(N*log N) time,
or O(KN*log N) for all K sub-sorts. Now we've got to do a K-way merge,
which will take O(log K) comparisons for each of the KN tuples, ie,
O(KN*log K)). Net result: exactly the same runtime.
Of course this argument fails if you have some way of obtaining the
sub-select values pre-sorted for free. But it's never really free.
Historical experience is that full-table indexscans often underperform
explicit sorts, at least when there are enough tuples involved to
make the problem interesting.
So the bottom line is that the use-case for this optimization seems
far too narrow to justify the implementation effort.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Jim Nasby | 2007-05-07 15:54:53 | Re: How to Find Cause of Long Vacuum Times - NOOB Question |
Previous Message | Gregory Stark | 2007-05-07 14:24:23 | Re: Merging large volumes of data |