Re: B-Tree support function number 3 (strxfrm() optimization)

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: B-Tree support function number 3 (strxfrm() optimization)
Date: 2014-09-04 18:12:43
Message-ID: CAM3SWZS3ZcYM7eYZw0dUu6CieJ8W4B3CYv1f=j2PgGxhvr8O_A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Sep 4, 2014 at 9:19 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Tue, Sep 2, 2014 at 10:27 PM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
>> * Still doesn't address the open question of whether or not we should
>> optimistically always try "memcmp() == 0" on tiebreak. I still lean
>> towards "yes".
>
> Let m be the cost of a memcmp() that fails near the end of the
> strings; and let s be the cost of a strcoll that does likewise.
> Clearly s > m. But approximately what is s/m on platforms where you
> can test? Say, with 100 byte string, in a few different locales.

Just to be clear: I imagine you're more or less sold on the idea of
testing equality in the event of a tie-break, where the leading 8
primary weight bytes are already known to be equal (and the full text
string lengths also match); the theory of operation behind testing how
good a proxy for full key cardinality abbreviated key cardinality is
is very much predicated on that. We can still win big with very low
cardinality sets this way, which are an important case. What I
consider an open question is whether or not we should do that on the
first call when there is no abbreviated comparison, such as on the
second or subsequent attribute in a multi-column sort, in the hope
that equality will just happen to be indicated.

> If for example s/m > 100 then it's a no-brainer, because in the worst
> case we're adding 1% overhead, and in the best case we're saving 99%.
> OTOH, if s/m < 2 then I almost certainly wouldn't do it, because in
> the worst case we're adding >50% overhead, and in the best case we're
> saving <50%. That seems like it's doubling down on the abbreviated
> key stuff to work mostly all the time, and I'm not prepared to make
> that bet. There is of course a lot of daylight between a 2-to-1 ratio
> and a 100-to-1 ratio and I expect the real value is somewhere in the
> middle (probably closer to 2); I haven't at this time made up my mind
> what value would make this worthwhile, but I'd like to know what the
> real numbers are.

Well, we can only lose when the strings happen to be the same size. So
that's something. But I'm willing to consider the possibility that the
memcmp() is virtually free. I would only proceed with this extra
optimization if that is actually the case. Modern CPUs are odd things.
Branch prediction/instruction pipelining, and the fact that we're
frequently stalled on cache misses might combine to make it
effectively the case that the opportunistic memcmp() is free. I could
be wrong about that, and I'm certainly wrong if you test large enough
strings with differences only towards the very end, but it seems
reasonable to speculate that it would work well with appropriate
precautions (in particular, don't do it when the strings are huge).
Let me try and come up with some numbers for a really unsympathetic
case, since you've already seen sympathetic numbers. I think the
sympathetic country/province/city sort test case [1] is actually
fairly representative; sort keys *are* frequently correlated like
that, implying that there are lots of savings to be had by being
"memcmp() == 0 optimistic" when resolving comparisons using the second
or subsequent attribute.

>> * Leaves open the question of what to do when we can't use the
>> abbreviated keys optimization just because a datum tuplesort is
>> preferred when sorting single-attribute tuples (recall that datum case
>> tuplesorts cannot use abbreviated keys). We want to avail of tuplesort
>> datum sorting where we can, except when abbreviated keys are
>> available, which presumably tips the balance in favor of heap tuple
>> sorting even when sorting on only one attribute, simply because then
>> we can then use abbreviated keys. I'm thinking in particular of
>> nodeAgg.c, which is an important case.
>
> I favor leaving this issue to a future patch. The last thing this
> patch needs is more changes that someone might potentially dislike.
> Let's focus on getting the core thing working, and then you can
> enhance it once we all agree that it is.

Makes sense. I think we should make a separate pass to enable sort
support for B-Tree sorting - that's probably the most compelling case,
after all. That's certainly the thing that I've heard complaints
about. There could be as many as 2-3 follow-up commits.

> On the substance of this issue, I suspect that for pass-by-value data
> types it can hardly be wrong to use the datum tuplesort approach; but
> it's possible we will want to disable it for pass-by-reference data
> types when the abbreviated-key infrastructure is available. That will
> lose if it turns out that the abbreviated keys aren't capturing enough
> of the entropy, but maybe we'll decide that's OK. Or maybe not. But
> I don't think it's imperative that this patch make a change in that
> area, and indeed, in the interest of keeping separate changes
> isolated, I think it's better if it doesn't.

Right. I had presumed that we'd want to figure that out each time. I
wasn't sure how best to go about doing that, which is why it's an open
item.

[1] http://www.postgresql.org/message-id/CAM3SWZQTYv3KP+CakZJZV3RwB1OJjaHwPCZ9cOYJXPkhbtcBVg@mail.gmail.com
--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Josh Berkus 2014-09-04 18:31:57 Re: PL/pgSQL 2
Previous Message Joel Jacobson 2014-09-04 17:53:19 Re: pgcrypto: PGP armor headers