From: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
---|---|
To: | Peter Geoghegan <pg(at)bowt(dot)ie>, Greg Stark <stark(at)mit(dot)edu> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: The case for removing replacement selection sort |
Date: | 2017-09-11 00:07:36 |
Message-ID: | 131a54e5-c8ea-89e7-6f95-445d07a24972@2ndquadrant.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 09/11/2017 01:03 AM, Peter Geoghegan wrote:
> On Sat, Sep 9, 2017 at 8:28 AM, Greg Stark <stark(at)mit(dot)edu> wrote:
>> This may be a bit "how long is a piece of string" but how do those two
>> compare with string sorting in an interesting encoding/locale -- say
>> /usr/share/dict/polish in pl_PL for example. It's certainly true that
>> people do sort text as well as numbers.
>
> I haven't looked at text, because I presume that it's very rare for
> tuples within a table to be more or less ordered by a text
> attribute. While the traditional big advantage of replacement
> selection has always been its ability to double initial run length on
> average, where text performance is quite interesting because
> localized clustering still happens, that doesn't really seem relevant
> here. The remaining use of replacement selection is expressly only
> about the "single run, no merge" best case, and in particular about
> avoiding regressions when upgrading from versions prior to 9.6 (9.6
> is the version where we began to generally prefer quicksort).
>
>> Also, people often sort on keys of more than one column....
>
> Very true. I should test this.
>
I'm currently re-running the benchmarks we did in 2016 for 9.6, but
those are all sorts with a single column (see the attached script). But
it'd be good to add a few queries testing sorts with multiple keys. We
can either tweak some of the existing data sets + queries, or come up
with entirely new tests.
The current tests construct data sets with different features (unique or
low/high-cardinality, random/sorted/almost-sorted). How should that work
for multi-key sorts? Same features for all columns, or some mix?
For the existing queries, I should have some initial results tomorrow,
at least for the data sets with 100k and 1M rows. The tests with 10M
rows will take much more time (it takes 1-2hours for a single work_mem
value, and we're testing 6 of them).
But perhaps we can reduce the number of tests somehow, only testing the
largest/smallest work_mem values? So that we could get some numbers now,
possibly running the complete test suite later.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment | Content-Type | Size |
---|---|---|
sort-bench.sh | application/x-shellscript | 15.3 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Jeff Janes | 2017-09-11 00:10:17 | Re: Automatic testing of patches in commit fest |
Previous Message | Michael Paquier | 2017-09-10 23:40:14 | Automatic testing of patches in commit fest |