Quick Links

Re: The case for removing replacement selection sort

From:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To:	Peter Geoghegan <pg(at)bowt(dot)ie>
Cc:	Greg Stark <stark(at)mit(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: The case for removing replacement selection sort
Date:	2017-09-11 00:59:07
Message-ID:	11546c13-f194-5078-f5cd-9e44ef8f04b1@2ndquadrant.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 09/11/2017 02:22 AM, Peter Geoghegan wrote:
> On Sun, Sep 10, 2017 at 5:07 PM, Tomas Vondra
> <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>> I'm currently re-running the benchmarks we did in 2016 for 9.6, but
>> those are all sorts with a single column (see the attached script). But
>> it'd be good to add a few queries testing sorts with multiple keys. We
>> can either tweak some of the existing data sets + queries, or come up
>> with entirely new tests.
>
> I see that work_mem is set like this in the script:
>
> "for wm in '1MB' '8MB' '32MB' '128MB' '512MB' '1GB'; do"
>
> I suggest that we forget about values over 32MB, since the question of
> how well quicksort does there was settled by your tests in 2016. I
> would also add '4MB' to the list of wm values that you'll actually
> test.

OK, so 1MB, 4MB, 8MB, 32MB?

>
> Any case with input that is initially in random order or DESC sort
> order is not interesting, either. I suggest you remove those, too.
>

OK.

> I think we're only interested in benchmarks where replacement
> selection really does get its putative best case (no merge needed in
> the end). Any (almost) sorted cases (the only cases that you are
> interesting to test now) will always manage that, once you set
> replacement_sort_tuples high enough, and provided there isn't even a
> single tuple that is completely out of order. The "before" cases here
> should have a replacement_sort_tuples of 1 billion (so that we're sure
> to not have the limit prevent the use of replacement selection in the
> first place), versus the "after" cases, which should have a
> replacement_sort_tuples of 0 to represent my proposal (to represent
> performance in a world where replacement selection is totally
> removed).
>

Ah, so you suggest doing all the tests on current master, by only
tweaking the replacement_sort_tuples value? I've been testing master vs.
your patch, but I guess setting replacement_sort_tuples=0 should have
the same effect.

I probably won't eliminate the random/DESC data sets, though. At least
not from the two smaller data sets - I want to do a bit of benchmarking
on Heikki's polyphase merge removal patch, and for that patch those data
sets are still relevant. Also, it's useful to have a subset of results
where we know we don't expect any change.

>> For the existing queries, I should have some initial results
>> tomorrow, at least for the data sets with 100k and 1M rows. The
>> tests with 10M rows will take much more time (it takes 1-2hours for
>> a single work_mem value, and we're testing 6 of them).
>
> I myself don't see that much value in a 10M row test.
>

Meh, more data is probably better. And with the reduced work_mem values
and skipping of random/DESC data sets it should complete much faster.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Re: The case for removing replacement selection sort at 2017-09-11 00:22:12 from Peter Geoghegan

Responses

Re: The case for removing replacement selection sort at 2017-09-11 01:39:20 from Peter Geoghegan

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Michael Paquier	2017-09-11 01:11:27	Re: Still another race condition in recovery TAP tests
Previous Message	Michael Paquier	2017-09-11 00:54:08	Re: [Proposal] Allow users to specify multiple tables in VACUUM commands