From: | Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com> |
---|---|
To: | Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> |
Cc: | Peter Geoghegan <pg(at)bowt(dot)ie>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: strcmp() tie-breaker for identical ICU-collated strings |
Date: | 2017-06-02 17:34:45 |
Message-ID: | CAJ3gD9fVfc-5H-MbH=JdL=QqvMV_dQyraOvZYhN56vt7X4LeOg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2 June 2017 at 03:18, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> wrote:
> On Fri, Jun 2, 2017 at 9:27 AM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
>> On Thu, Jun 1, 2017 at 2:24 PM, Thomas Munro
>> <thomas(dot)munro(at)enterprisedb(dot)com> wrote:
>>> Why should ICU be any different than the system provider in this
>>> respect? In both cases, we have a two-level comparison: first we use
>>> the collation-aware comparison, and then as a tie breaker, we use a
>>> binary comparison. If we didn't do a binary comparison as a
>>> tie-breaker, wouldn't the result be logically incompatible with the =
>>> operator, which does a binary comparison?
Ok. I was thinking we are doing the tie-breaker because specifically
strcoll_l() was unexpectedly returning 0 for some cases. Now I get it,
that we do that to be compatible with texteq().
Secondly, I was also considering if ICU especially has a way to
customize an ICU locale by setting some attributes which dictate
comparison or sorting rules for a set of characters. I mean, if there
is such customized ICU locale defined in the system, and we use that
to create PG collation, I thought we might have to strictly follow
those rules without a tie-breaker, so as to be 100% conformant to ICU.
I can't come up with an example, or may there isn't one, but , say ,
there is a locale which is supposed to sort only by lowest comparison
strength (de(at)strength=1 ?? ). In that case, there might be many
characters considered equal, but PG < operator or > operator would
still return true for those chars.
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company
From | Date | Subject | |
---|---|---|---|
Next Message | Kevin Grittner | 2017-06-02 17:44:16 | Re: Re: [GSOC 17] Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions |
Previous Message | Teodor Sigaev | 2017-06-02 17:28:21 | Re: Perfomance bug in v10 |