From: | Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Mithun Cy <mithun(dot)cy(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [PATCH] Incremental sort |
Date: | 2017-09-15 23:46:05 |
Message-ID: | CAPpHfdv9_-a4ejaEUqeQ31NWgMokfuu2_XS4MGRa3UcDO7Meiw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Sep 14, 2017 at 2:48 AM, Alexander Korotkov <
a(dot)korotkov(at)postgrespro(dot)ru> wrote:
> Patch rebased to current master is attached. I'm going to improve my
> testing script and post new results.
>
New benchmarking script and results are attached. There new dataset
parameter is introduced: skew factor. Skew factor defines skew in
distribution of groups sizes.
My idea of generating is just usage of power function where power is from 0
to 1. Following formula is used to get group number for particular item
number i.
[((i / number_of_indexes) ^ power) * number_of_groups]
For example, power = 1/6 gives following distribution of groups sizes:
group number group size
0 2
1 63
2 665
3 3367
4 11529
5 31031
6 70993
7 144495
8 269297
9 468558
For convenience, instead of power itself, I use skew factor where power =
1.0 / (1.0 + skew). Therefore, with skew = 0.0, distribution of groups
sizes is uniform. Larger skew gives more skewed distribution (and that
seems to be quite intuitive). For, negative skew, group sizes are mirrored
as for corresponding positive skew. For example, skew factor = -5.0 gives
following groups sizes distribution:
group number group size
0 468558
1 269297
2 144495
3 70993
4 31031
5 11529
6 3367
7 665
8 63
9 2
Results shows that between 2172 test cases, in 2113 incremental sort gives
speedup while in 59 it causes slowdown. The following 4 test cases show
most significant slowdown (>10% of time).
Table GroupedCols GroupCount Skew PreorderedFrac
FullSortMedian IncSortMedian TimeChangePercent
int4|int4|numeric 1 100 -10 0
1.5688240528 2.0607631207 31.36
text|int8|text|int4 1 1 0 0
1.7785198689 2.1816160679 22.66
int8|int8|int4 1 10 -10 0
1.136412859 1.3166360855 15.86
numeric|text|int4|int8 2 10 -10 1
0.4403841496 0.5070910454 15.15
As you can see, 3 of this 4 test cases have skewed distribution while one
of them is related to costly location-aware comparison of text. I've no
particular idea of how to cope these slowdowns. Probably, it's OK to have
slowdown in some cases while have speedup in majority of cases (assuming
there is an option to turn off new behavior). Probably, we should teach
optimizer more about skewed distributions of groups, but that doesn't seem
feasible for me.
Any thoughts?
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachment | Content-Type | Size |
---|---|---|
incsort_test.py | text/x-python-script | 3.5 KB |
results.csv | text/csv | 151.5 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Alexander Korotkov | 2017-09-15 23:51:35 | Re: Should we cacheline align PGXACT? |
Previous Message | Nikita Glukhov | 2017-09-15 23:31:43 | Re: SQL/JSON in PostgreSQL |