Re: Re: Abbreviated keys for Datum tuplesort

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: Abbreviated keys for Datum tuplesort
Date: 2015-04-02 23:08:13
Message-ID: CAM3SWZTiZt+dfvkYArHnYvjmc6ewuQyXawp4ZWfghhVaXMjP-Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Apr 2, 2015 at 11:21 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Thu, Apr 2, 2015 at 5:34 PM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
>> I think it's explained by the pre-check for sorted input making the
>> number of comparisons exactly n -1. As I pointed out to Tomas, if you
>> put a single, solitary unsorted element at the end, the abbreviated
>> version is then 8x faster (maybe that was in relation to a slightly
>> different case, but the pattern is the same). So this case isn't an
>> argument against datum abbreviation, or even abbreviation in general,
>> but rather an argument against using strxfrm() in general (which for
>> example the GCC docs strongly recommend for sorting lists of strings).
>> It's a bad argument, IMV. This sort is already extremely fast.
>
> OK, I see.

Wait. It's also due to the fact that some cases are spuriously
aborted, because the cost model for text needs to be fixed, I believe.
It's not clear that this is actually such a case from Tomas' remarks,
because the smaller scale tested *did* win significantly, and that
appeared to be otherwise comparable to the regressed cases. So
although what I describe here about perfectly pre-sorted input is
something I've seen (and something that we discussed on another thread
IIRC), this is not an example of it. Again, this is just an example of
why we need to fix the cost model for text.

--
Peter Geoghegan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Petr Jelinek 2015-04-02 23:46:12 Unused variable in hashpage.c
Previous Message Peter Geoghegan 2015-04-02 23:02:59 Re: Re: Abbreviated keys for Datum tuplesort