Quick Links

Re: strcmp() tie-breaker for identical ICU-collated strings

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Peter Geoghegan <pg(at)bowt(dot)ie>
Cc:	Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: strcmp() tie-breaker for identical ICU-collated strings
Date:	2017-06-01 21:48:17
Message-ID:	19269.1496353697@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Peter Geoghegan <pg(at)bowt(dot)ie> writes:
> On Thu, Jun 1, 2017 at 2:24 PM, Thomas Munro
> <thomas(dot)munro(at)enterprisedb(dot)com> wrote:
>> Why should ICU be any different than the system provider in this
>> respect? In both cases, we have a two-level comparison: first we use
>> the collation-aware comparison, and then as a tie breaker, we use a
>> binary comparison. If we didn't do a binary comparison as a
>> tie-breaker, wouldn't the result be logically incompatible with the =
>> operator, which does a binary comparison?

> I agree with that assessment.

The critical reason why this is not optional is that if texteq were to
return true for strings that aren't bitwise identical, that breaks hashing
--- unless you can guarantee that the hash values for such strings will be
equal anyway. That's hardly possible when we don't even know what the
collation's comparison rule is, and would likely be difficult even if
we had complete knowledge.

So no, we're not going there for ICU any more than we did for libc.

regards, tom lane

In response to

Re: strcmp() tie-breaker for identical ICU-collated strings at 2017-06-01 21:27:08 from Peter Geoghegan

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Thomas Munro	2017-06-01 21:48:31	Re: strcmp() tie-breaker for identical ICU-collated strings
Previous Message	Joe Conway	2017-06-01 21:45:29	Re: Hash Functions