From: | Yao Wang <yao-yw(dot)wang(at)broadcom(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Konstantin Knizhnik <knizhnik(at)garret(dot)ru>, John Naylor <johncnaylorls(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Wang Yao <yaowangm(at)outlook(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Hongxu Ma <hongxu(dot)ma(at)broadcom(dot)com> |
Subject: | Re: 回复: An implementation of multi-key sort |
Date: | 2024-07-26 11:18:35 |
Message-ID: | CACq1W05CY=kJHA4Nq3hq52eskoG8GpZUdrtR+FF4u+F9BRa9tQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Thanks for all of your comments.
Progress
--------
Because there are too many details for discussion, let me summarize to
two major issues:
1. Since it seems the perf is ideal for data types without specialized
comparator, can we improve the perf for data types with specialized
comparator to a satisfying level?
I refactored some code on mksort code path and eliminated kinda perf
bottlenecks. With latest code, most of results shows mksort got better
perf than qsort. For other cases, the regressions are usually less than
5% except seldom exceptions. Since most of the exceptions are transient
(happening occasionally in a series of "normal" results), I prefer they
are due to external interruption because the test machine is not
dedicated. Let's discuss on the latest code.
2. Should we update optimizer to take in account statistic
(pg_statistic->stadistinct) and choose mksort/qsort accordingly?
The trick here is not just about the reliability of stadistinct. The
sort perf is affected by a couple of conditions: sort key count, distinct
ratio, data type, and data layout. e.g. with int type, 5 sort keys, and
distinct ratio is about 0.05, we can see about 1% perf improvement for
"random" data set, about 7% for "correlated" data set, and almost the
same for "sequential" data set because of pre-ordered check. So it is
pretty difficult to create a formula to calculate the benefit.
Anyway, I tried to make an implementation by adding "mkqsApplicable"
determined by optimizer, so we can discuss based on code and perf result.
I also updated the test script to add an extra column "mk_enabled"
indicating whether mksort is enabled or not by optimizer.
According to the result, the choosing mechanism in optimizer
almost eliminated all regressions. Please note that even when mksort is
disabled (i.e. qsort was performed twice actually), there are still
"regressions" which are usually less than 5% and should be accounted
to kinda error range.
I am still hesitating about putting the code to final version because
there are a number of concerns:
a. for sort keys without a real table, stadistinct is unavailable.
b. stadistinct may not be accurate as mentioned.
c. the formula I used may not be able to cover all possible cases.
On the other hand, the worst result may be just 5% regression. So the
side effects may not be so serious?
I can refine the code (e.g. for now mkqsApplicable is valid only for
some particular code paths), but I prefer to do more after we have a
clear decision about whether the code in optimizer is needed.
Please let me know your comments, thanks.
Attachements
------------
- v6-Implement-multi-key-quick-sort.patch
(v6) The latest code without optimizer change
- v6-1-add-Sort-ndistInFirstRow.patch .
(v6.1) The latest code with optimizer change, can be applied to v6
- mksort-test-v2.sh
The script made by Tomas and modified by me to produce test result
- new_report.txt
Test result in a small data set (100,000 rows) based on v6
- new_report_opti.txt
Test result in a small data set (100,000 rows) based on v6.1
I tried to produce a "full" report with all data ranges, but it seems
kept working for more than 15 hours on my machine and was always
disturbed by other heavy loads. However, I did run tests on some
datasets with other sizes and got similar results.
Answers for other questions
---------------------------
1. Can we enable mksort for just particular data types?
As I mentioned, it is not easy to make the decision considering all the
factors impacting the result and all possible combinations. Code in
optimizer I showed may be a grip.
2. Does v5 include the code about "distinct stats info" and others?
No. As I mentioned, all code in "Potential improvement spaces" was not
included in v5 (or not implemented at all). v6.1 includes some code in
optimizer.
3. Should we remove the GUC enable_mk_sort?
I kept it at least for coding phase. And I prefer keeping it permanently
in case some scenarios we are not aware of.
4. Build failure on 32-bit ARM
It is a code fault by myself. ApplySignedSortComparator() is built only
when SIZEOF_DATUM >= 8. I was aware of that, but missed encapsulating
all relevant code in the condition. It is supposed to have been fixed on
v6, but I don't have a 32-bit ARM platform. @Tomas please take a try if
you still have interest, thanks.
5. How templating could address the regressions?
Please refer the implementation of qsort in sort_template.h, which adapted
kinda template mechanism by using macros since C language does not have
built-in template. .e.g. for comparator, it uses a macro ST_COMPARE which
is specialized for different functions (such as
qsort_tuple_unsigned_compare()) for different data types. As a contrast,
mksort needs to determine the comparator on runtime for each comparison
(see mkqs_compare_tuple()), which needs more costs. Although the cost is
not much, comparison is very performance sensitive. (About 1~2% regression
if my memory is correct)
Thanks,
Yao Wang
On Wed, Jul 10, 2024 at 2:58 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Sun, Jul 7, 2024 at 2:32 AM Konstantin Knizhnik <knizhnik(at)garret(dot)ru> wrote:
> > If mksort really provides advantage only when there are a lot of
> > duplicates (for prefix keys?) and of small fraction of duplicates there
> > is even some (small) regression
> > then IMHO taking in account in planner information about estimated
> > number of distinct values seems to be really important.
>
> I don't think we can rely on the planner's n_distinct estimates for
> this at all. That information tends to be massively unreliable when we
> have it at all. If we rely on it for good performance, it will be easy
> to find cases where it's wrong and performance is bad.
>
> --
> Robert Haas
> EDB: http://www.enterprisedb.com
--
This electronic communication and the information and any files transmitted
with it, or attached to it, are confidential and are intended solely for
the use of the individual or entity to whom it is addressed and may contain
information that is confidential, legally privileged, protected by privacy
laws, or otherwise restricted from disclosure to anyone else. If you are
not the intended recipient or the person responsible for delivering the
e-mail to the intended recipient, you are hereby notified that any use,
copying, distributing, dissemination, forwarding, printing, or copying of
this e-mail is strictly prohibited. If you received this e-mail in error,
please return the e-mail to the sender, delete it from your computer, and
destroy any printed copy of it.
Attachment | Content-Type | Size |
---|---|---|
new_report.txt | text/plain | 121.0 KB |
new_report_opti.txt | text/plain | 120.7 KB |
mksort-test-v2.sh | application/x-sh | 4.8 KB |
v6-Implement-multi-key-quick-sort.patch | application/octet-stream | 88.1 KB |
v6-1-add-Sort-ndistInFirstRow.patch | application/octet-stream | 21.0 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Noah Misch | 2024-07-26 11:29:58 | Re: Built-in CTYPE provider |
Previous Message | Alexander Lakhin | 2024-07-26 11:00:00 | Re: fairywren timeout failures on the pg_amcheck/005_opclass_damage test |