From: | Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Mithun Cy <mithun(dot)cy(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [PATCH] Incremental sort |
Date: | 2017-10-02 16:37:03 |
Message-ID: | CAPpHfduS7SBC_J0frWX1tToZkKyyAjLVDQH1wcS7Rbvjg6nV-g@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sat, Sep 30, 2017 at 11:20 PM, Alexander Korotkov <
a(dot)korotkov(at)postgrespro(dot)ru> wrote:
> On Sat, Sep 16, 2017 at 2:46 AM, Alexander Korotkov <
> a(dot)korotkov(at)postgrespro(dot)ru> wrote:
>
>> On Thu, Sep 14, 2017 at 2:48 AM, Alexander Korotkov <
>> a(dot)korotkov(at)postgrespro(dot)ru> wrote:
>>
>>> Patch rebased to current master is attached. I'm going to improve my
>>> testing script and post new results.
>>>
>>
>> New benchmarking script and results are attached. There new dataset
>> parameter is introduced: skew factor. Skew factor defines skew in
>> distribution of groups sizes.
>> My idea of generating is just usage of power function where power is from
>> 0 to 1. Following formula is used to get group number for particular item
>> number i.
>> [((i / number_of_indexes) ^ power) * number_of_groups]
>> For example, power = 1/6 gives following distribution of groups sizes:
>> group number group size
>> 0 2
>> 1 63
>> 2 665
>> 3 3367
>> 4 11529
>> 5 31031
>> 6 70993
>> 7 144495
>> 8 269297
>> 9 468558
>>
>> For convenience, instead of power itself, I use skew factor where power =
>> 1.0 / (1.0 + skew). Therefore, with skew = 0.0, distribution of groups
>> sizes is uniform. Larger skew gives more skewed distribution (and that
>> seems to be quite intuitive). For, negative skew, group sizes are mirrored
>> as for corresponding positive skew. For example, skew factor = -5.0 gives
>> following groups sizes distribution:
>> group number group size
>> 0 468558
>> 1 269297
>> 2 144495
>> 3 70993
>> 4 31031
>> 5 11529
>> 6 3367
>> 7 665
>> 8 63
>> 9 2
>>
>> Results shows that between 2172 test cases, in 2113 incremental sort
>> gives speedup while in 59 it causes slowdown. The following 4 test cases
>> show most significant slowdown (>10% of time).
>>
>> Table GroupedCols GroupCount Skew PreorderedFrac
>> FullSortMedian IncSortMedian TimeChangePercent
>> int4|int4|numeric 1 100 -10 0
>> 1.5688240528 2.0607631207 31.36
>> text|int8|text|int4 1 1 0 0
>> 1.7785198689 <(778)%20519-8689> 2.1816160679 22.66
>> int8|int8|int4 1 10 -10 0
>> 1.136412859 1.3166360855 15.86
>> numeric|text|int4|int8 2 10 -10 1
>> 0.4403841496 0.5070910454 15.15
>>
>> As you can see, 3 of this 4 test cases have skewed distribution while one
>> of them is related to costly location-aware comparison of text. I've no
>> particular idea of how to cope these slowdowns. Probably, it's OK to have
>> slowdown in some cases while have speedup in majority of cases (assuming
>> there is an option to turn off new behavior). Probably, we should teach
>> optimizer more about skewed distributions of groups, but that doesn't seem
>> feasible for me.
>>
>> Any thoughts?
>>
>
> BTW, replacement selection sort was removed by 8b304b8b. I think it worth
> to rerun benchmarks after that, because results might be changed. Will do.
>
I've applied patch on top of c12d570f and rerun the same benchmarks.
CSV-file with results is attached. There is no dramatical changes. There
is still minority of performance regression cases while majority of cases
has improvement.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachment | Content-Type | Size |
---|---|---|
results2.csv | text/csv | 151.5 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Eisentraut | 2017-10-02 16:42:36 | Re: CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it? |
Previous Message | Robert Haas | 2017-10-02 16:36:02 | Re: issue: record or row variable cannot be part of multiple-item INTO list |