Quick Links

Re: B-Tree support function number 3 (strxfrm() optimization)

From:	Peter Geoghegan <pg(at)heroku(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: B-Tree support function number 3 (strxfrm() optimization)
Date:	2014-09-12 18:58:46
Message-ID:	CAM3SWZTyi30CDp7soy23axGEAEnxME1kz3KKdmr=v5g8=S8WMA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Fri, Sep 12, 2014 at 11:38 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> Based on discussion thus far it seems that there's a possibility that
> the trade-off may be different for short strings vs. long strings. If
> the string is small enough to fit in the L1 CPU cache, then it may be
> that memcmp() followed by strcoll() is not much more expensive than
> strcoll(). That should be easy to figure out: write a standalone C
> program that creates a bunch of arbitrary, fairly-short strings, say
> 32 bytes, in a big array.

While I think that's fair, the reason I didn't bother playing tricks
with only doing a (purely) opportunistic memcmp() when the string size
is under (say) CACHE_LINE_SIZE bytes is that in order for it to matter
you'd have to have a use case where the first CACHE_LINE_SIZE of bytes
matched, and the string just happened to be identical in length, but
also ultimately differed at least a good fraction of the time. That
seems like the kind of thing that it's okay to care less about. That
might have been regressed worse than what you've seen already. It's
narrow in a whole new dimension, though. The intersection of that
issue, and the issues exercised by Heikki's existing test case must be
exceedingly rare.

I'm still confused about whether or not we're talking at cross
purposes here, Robert. Are you happy to consider this as a separate
and additional question to the question of what to do in an
abbreviated comparison tie-break? The correlated multiple sort key
attributes case strikes me as very common - it's a nice to have, and
will sometimes offer a nice additional boost. On the other hand, doing
this for abbreviated comparison tie-breakers is more or less
fundamental to the patch. In my cost model, a memcmp() abbreviated key
tie-breaker than works out is equivalent to an abbreviated comparison.
This is a bit of a fudge, but close enough.

BTW, I do appreciate your work on this. I realize that if you didn't
give this patch a fair go, there is a chance that no one else would.

--
Peter Geoghegan

In response to

Re: B-Tree support function number 3 (strxfrm() optimization) at 2014-09-12 18:38:58 from Robert Haas

Responses

Re: B-Tree support function number 3 (strxfrm() optimization) at 2014-09-12 19:02:27 from Robert Haas

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Robert Haas	2014-09-12 19:02:27	Re: B-Tree support function number 3 (strxfrm() optimization)
Previous Message	Robert Haas	2014-09-12 18:56:04	Re: Support for N synchronous standby servers