From: | Jeremy Harris <jgh(at)wizmail(dot)org> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Minor performance improvement in transition to external sort |
Date: | 2014-02-06 22:30:01 |
Message-ID: | 52F40CE9.1070509@wizmail.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 06/02/14 14:22, Robert Haas wrote:
> On Thu, Feb 6, 2014 at 4:22 AM, Jeremy Harris <jgh(at)wizmail(dot)org> wrote:
>> On 05/02/14 21:56, Robert Haas wrote:
>>> On Tue, Feb 4, 2014 at 5:22 PM, Jeremy Harris <jgh(at)wizmail(dot)org> wrote:
>>>> The attached patch replaces the existing siftup method for heapify with
>>>> a siftdown method. Tested with random integers it does 18% fewer
>>>> compares and takes 10% less time for the heapify, over the work_mem
>>>> range 1024 to 1048576.
>>>>
>>>> Both algorithms appear to be O(n) (contradicting Wikipedia's claim
>>>> that a siftup heapify is O(n log n)).
>>>
>>>
>>> I think Wikipedia is right. Inserting an a tuple into a heap is O(lg
>>> n); doing that n times is therefore O(n lg n). Your patch likewise
>>> implements an outer loop which runs O(n) times and an inner loop which
>>> runs at most O(lg n) times, so a naive analysis of that algorithm also
>>> comes out to O(n lg n).
>>
>> The patch implements a siftdown. Measurements attached.
>
> Uh, this spreadsheet is useless. You haven't explained how these
> numbers were generated, or what the columns in the spreadsheet
> actually mean. Ideally, you should include enough information that
> someone else can try to reproduce your results.
>
I'm sorry I wasn't clear.
Test code was groups of the form:
set work_mem to 1024;
CREATE TABLE jgh (i integer);
INSERT INTO jgh (i) SELECT 20000 * random() FROM generate_series(1, 20000);
VACUUM ANALYZE jgh;
explain analyze SELECT * FROM jgh ORDER BY i;
set enable_siftdown_heapify to off;
explain analyze SELECT * FROM jgh ORDER BY i;
set enable_siftdown_heapify to on;
DROP TABLE jgh;
.. for the tabulated work_mem, and scaled values for the INSERT.
Compares were counted for the heapify stage (only), and timings
taken using pg_rusage_init() before and pg_rusage_show() after
it. Times are in seconds.
Baseline value for compares and time used 858ec11858.
Sift-down compares and time are for the patch.
"Baseline K cmps" is compares divided by work_mem (which should
be a scaled version of compares-per-tuple being heapified) pre-patch.
"Siftdown K cmps" is likewise with the patch.
"K ratio cmps" is the ratio (patched compares)/(unpatched compares).
Repeat for time measurements.
--
Cheers,
Jeremy
From | Date | Subject | |
---|---|---|---|
Next Message | Greg Stark | 2014-02-06 22:41:19 | Re: Recovery inconsistencies, standby much larger than primary |
Previous Message | David Johnston | 2014-02-06 22:22:47 | Re: 'dml' value for log_statement |