Re: 回复: An implementation of multi-key sort

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Yao Wang <yao-yw(dot)wang(at)broadcom(dot)com>, John Naylor <johncnaylorls(at)gmail(dot)com>
Cc: Wang Yao <yaowangm(at)outlook(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Hongxu Ma <hongxu(dot)ma(at)broadcom(dot)com>
Subject: Re: 回复: An implementation of multi-key sort
Date: 2024-07-08 15:14:38
Message-ID: b145ea73-3ddb-4568-b9a6-1f5b59b6dfe1@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 7/4/24 14:45, Yao Wang wrote:
> Hi John,
>
> Thanks for your kind message. I talked to Heikki before getting Tomas's
> response, and he said "no promise but I will take a look". That's why I
> added his email. I have updated the CF entry and added Tomas as reviewer.
>
> Hi Tomas,
>
> Again, I'd say a big thank to you. The report and script are really, really
> helpful. And your ideas are very valuable.
>
> Firstly, the expectation of mksort performance:
>
> 1. When mksort works well, it should be faster than qsort because it saves
> the cost of comparing duplicated values every time.
> 2. When all values are distinct at a particular column, the comparison
> will finish immediately, and mksort will actually fall back to qsort. For
> the case, mksort should be equal or a bit slower than qsort because it need
> to maintain more complex state.
>
> Generally, the benefit of mksort is mainly from duplicated values and sort
> keys: the more duplicated values and sort keys are, the bigger benefit it
> gets.
>
> Analysis on the report in your previous mail
> --------------------------------------------
>
> 1. It seems the script uses $count to specify the duplicated values:
>
> number of repetitions for each value (ndistinct = nrows/count)
>
> However, it is not always correct. For type text, the script generates
> values like this:
>
> expr="md5(((i / $count) + random())::text)"
>
> But md5() generates totally random values regardless of $count. Some cases
> of timestamptz have the same problem.
>
> For all distinct values, the sort will finish at first depth and fall to
> qsort actually.
>

You're right, thanks for noticing / fixing this.

> 2. Even for the types with correct duplicated setting, the duplicated ratio
> is very small: e.g. say $nrows = 10000 and $count = 100, only 1% duplicated
> rows can go to depth 2, and only 0.01% of them can go to depth 3. So it still
> works on nearly all distinct values.
>

True, but that's why the scripts test with much larger data sets too,
with more comparisons needing to look at other columns. It's be possible
to construct data sets that are likely to benefit more from mksort - I'm
not against doing that, but then there's the question of what data sets
are more representative of what users actually do.

I'd say a random data set like the ones I used are fairly common - it's
fine to not improve them, but we should not regress them.

> 3. Qsort of PG17 uses kind of specialization for tuple comparator, i.e. it
> uses specialized functions for different types, e.g. qsort_tuple_unsigned()
> for unsigned int. The specialized comparators avoid all type related checks
> and are much faster than regular comparator. That is why we saw 200% or more
> regression for the cases.
>

OK, I'm not familiar with this code enough to have an opinion.

>
> Code optimizations I did for mk qsort
> -------------------------------------
>
> 1. Adapted specialization for tuple comparator.
> 2. Use kind of "hybrid" sort: when we actually adapt bubble sort due to
> limited sort items, use bubble sort to check datums since specified depth.
> 3. Other other optimizations such as pre-ordered check.
>
>
> Analysis on the new report
> --------------------------
>
> I also did some modifications to your script about the issues of data types,
> plus an output about distinct value count/distinct ratio, and an indicator
> for improvement/regression. I attached the new script and a report on a
> data set with 100,000 rows and 2, 5, 8 columns.
>

OK, but I think a report for a single data set size is not sufficient to
evaluate a patch like this, it can easily miss various caching effects
etc. The results I shared a couple minutes ago are from 1000 to 10M
rows, and it's much more complete view.

> 1. Generally, the result match the expectation: "When mksort works well, it
> should be faster than qsort; when mksort falls to qsort, it should be equal
> or a bit slower than qsort."

The challenge is how to know in advance if mksort is likely to work well.

> 2. For all values of "sequential" (except text type), mksort is a bit slower
> than qsort because no actual sort is performed due to the "pre-ordered"
> check.

OK

> 3. For int and bigint type, mksort became faster and faster when
> there were more and more duplicated values and sort keys. Improvement of
> the best cases is about 58% (line 333) and 57% (line 711).

I find it hard to interpret the text-only report, but I suppose these
are essentially the "green" patches in the PDF report I attached to my
earlier message. And indeed, there are nice improvements, but only with
cases with very many duplicates, and the price for that is 10-20%
regressions in the other cases. That does not seem like a great trade
off to me.

> 4. For timestamptz type, mksort is a bit slower than qsort because the
> distinct ratio is always 1 for almost all cases. I think more benefit is
> available by increasing the duplicated values.

Yeah, this was a bug in my script, generating too many distinct values.
After fixing that, it behaves pretty much exactly like int/bigint, which
is not really surprising.

> 5. For text type, mksort is faster than qsort for all cases, and
> improvement of the best case is about 160% (line 1510). It is the only
> tested type in which specialization comparators are disabled.
>

Correct.

> Obviously, text has much better improvement than others. I suppose the cause
> is about the specialisation comparators: for the types with them, the
> comparing is too faster so the cost saved by mksort is not significant. Only
> when saved cost became big enough, mksort can defeat qsort.
>
> For other types without specialisation comparators, mksort can defeat
> qsort completely. It is the "real" performance of mksort.
>

No opinion, but if this is the case, then maybe the best solution is to
only use mksort for types without specialized comparators.

>
> Answers for some other questions you mentioned
> ----------------------------------------------
>
> Q1: Why are almost all the cases that got better for a single-column sort?
>
> A: mksort is enabled only for multi column sort. When there is only one
> column, qsort works. So we can simply ignore the cases.
>
> Q2: Why did the perf become worse by just reversing the sort keys?
>
> A: In the example we used, the sort keys are ordered from more duplicated
> to less. Please see the SQL:
>
> update t1 set c2 = c1 % 100, c3 = c1 % 50, c4 = c1 % 10, c5 = c1 % 3;
> update t1 set c6 = 'aaaaaaaaaaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbbbbbbbb'
> || (c1 % 5)::text;
>
> So c6 has most duplicated values, c5 has less, and so on. By the order
> "C6, c5, c4, ...", mksort can take effect on every sort key.
>
> By the reverse order "c2, c3, c4...", mksort almost finished on first
> sort key (c2) because it has only 1% duplicated values, and fell back
> to qsort actually.
>
> Based on the new code, I reran the example, and got about 141% improvement
> for order "c6, c5, c4...", and about -4% regression for order
> "c2, c3, c4...".
>

OK

> Q3: Does mksort work effectively only for particular type, e.g. string?
>
> A: No, the implementation of mksort does not distinguish data type for
> special handling. It just calls existing comparators which are also
> used by qsort. I used long prefix for string just to enlarge the time
> cost of comparing to amplify the result. The new report shows mksort
> can work effectively on non-string types and string without long prefix.
>

Maybe I misunderstood, but I think it seems to help much less for types
with specialized comparators, so maybe I'd rephrase this if it only
works effectively for types without them.

> Q4: Was the algorithm good for a hardware at that time, but the changes
> (e.g. the growing important of on-CPU caches) made it less relevant?
>
> A: As my understanding, the answer is no because the benefit of mksort
> is from saving cost for duplicated comparison, which is not related to
> hardware. I suppose the new report can prove it.
>
> However, the hardware varying definitely affects the perf, especially
> considering that the perf different between mksort and qsort is not so
> big when mksort falls back to qsort. I am not able to test on a wide
> range of hardwares, so any finding is appreciated.
>

OK. FWIW I think it's important to test with a range of data set sizes
exactly to evaluate these hardware-related effects (some of which are
related to size of various caches).

>
> Potential improvement spaces
> ----------------------------
>
> I tried some other optimizations but didn't add the code finally because
> the benefit is not very sure and/or the implementation is complex. Just
> raise them for more discussion if necessary:
>
> 1. Use distinct stats info of table to enable mksort
>
> It's kind of heuristics: in optimizer, check Form_pg_statistic->stadistinct
> of a table via pg_statistics. Enable mksort only when it is less than a
> threshold.
>
> The hacked code works, which need to modify a couple of interfaces of
> optimizer. In addition, a complete solution should consider types and
> distinct values of all columns, which might be too complex, and the benefit
> seems not so big.
>

I assume that's not in v5, or did I miss a part of the patch doing it?

I any case, I suspect relying on stadistinct is going to be unreliable.
It's known to be pretty likely to be off, and especially if this is
about multiple columns, which can be correlated in some way.

It would be much better if we could make this decision at runtime, based
on some cheap heuristics. Not sure if that's possible, though.

> 2. Cache of datum positions
>
> e.g. for heap tuple, we need to extract datum position from SortTuple by
> extract_heaptuple_from_sorttuple() for comparing, which is executed
> for each datum. By comparison, qsort does it once for each tuple.
> Theoretically we can create a cache to remember the datum positions to
> avoid duplicated extracting.
>
> The hacked code works, but the improvement seems limited. Not sure if more
> improvement space is available.
>

No idea. But it seems more like an independent optimization than a fix
for the cases where mksort v5 regresses, right?

> 3. Template mechanism
>
> Qsort uses kind of template mechanism by macro (see sort_template.h), which
> avoids cost of runtime type check. Theoretically template mechanism can be
> applied to mksort, but I am hesitating because it will impose more complexity
> and the code will become difficult to maintain.
>

No idea, but same as above - I don't see how templating could address
the regressions.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2024-07-08 15:22:08 Re: optimizing pg_upgrade's once-in-each-database steps
Previous Message Tom Lane 2024-07-08 14:42:20 Re: array_in sub function ReadArrayDimensions error message