Quick Links

Re: Using quicksort for every external sort run

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Peter Geoghegan <pg(at)heroku(dot)com>
Cc:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Using quicksort for every external sort run
Date:	2016-04-07 18:10:58
Message-ID:	CA+TgmobfJGNg8wojiJgv42xrsP8op0DMVYjt2XjoiGdn3+4-gQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Thu, Apr 7, 2016 at 1:17 PM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
>> I certainly agree that GUCs that aren't easy to tune are bad. I'm
>> wondering whether the fact that this one is hard to tune is something
>> that can be fixed. The comments about "padding" - a term I don't
>> like, because it to me implies a deliberate attempt to game the
>> benchmark when in reality wanting to sort a wide row is entirely
>> reasonable - make me wonder if this should be based on a number of
>> tuples rather than an amount of memory. If considering the row width
>> makes us get the wrong answer, then let's not do that.
>
> That's a good point. While I don't think it will make it easy to tune
> the GUC, it will make it easier. Although, I think that it should
> probably still be GUC_UNIT_KB. That should just be something that my
> useselection() function compares to the overall size of memtuples
> alone when we must initially spill, not the value of
> work_mem/maintenance_work_mem. The degree of padding isn't entirely
> irrelevant, because not all comparisons will be resolved at the
> stup.datum1 level, but it's still clearly an improvement to not have
> wide tuples mess with things.
>
> Would you like me to revise the patch along those lines? Or, do you
> prefer units of tuples? Tuples are basically equivalent, but make it
> way less obvious what the relationship with CPU cache might be. If I
> revise the patch along these lines, I should also reduce the default
> replacement_sort_mem to produce roughly equivalent behavior for
> non-padded cases.

I prefer units of tuples, with the GUC itself therefore being
unitless. I suggest we call the parameter replacement_sort_threshold
and document that (1) the ideal value may depend on the amount of CPU
cache available to running processes, with more cache implying higher
values; and (2) the ideal value may depend somewhat on the input data,
with more correlation implying higher values. And then pick some
value that you think is likely to work well for most people and call
it good.

If you could prepare a new patch with those changes and also making
the changes requested in my other email, I will try to commit that
before the deadline. Thanks.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Re: Using quicksort for every external sort run at 2016-04-07 17:17:19 from Peter Geoghegan

Responses

Re: Using quicksort for every external sort run at 2016-04-07 22:23:23 from Peter Geoghegan
Re: Using quicksort for every external sort run at 2016-04-08 03:39:43 from Peter Geoghegan

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Peter Geoghegan	2016-04-07 18:16:50	Re: Using quicksort for every external sort run
Previous Message	Robert Haas	2016-04-07 18:05:11	Re: Using quicksort for every external sort run