From: | "David E(dot) Wheeler" <david(at)kineticode(dot)com> |
---|---|
To: | Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM> |
Cc: | pgsql-hackers(at)postgresql(dot)org, Alvaro Herrera <alvherre(at)commandprompt(dot)com> |
Subject: | Re: PATCH: CITEXT 2.0 |
Date: | 2008-07-07 20:15:03 |
Message-ID: | 8E2D49F2-E366-4504-9428-0AB6F35468FA@kineticode.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Jul 7, 2008, at 12:46, Zdenek Kotala wrote:
>> So, the upshot is that the = and <> operators are not locale-aware,
>> yes? They just do byte comparisons. Is that really the way it
>> should be? I mean, could there not be strings that are equivalent
>> but have different bytes?
>
> Correct. The problem is complex. It works fine only for normalized
> string. But postgres now assume that all utf8 strings are normalized.
I see. So binary equivalence is okay, in that case.
> If you need to implement < <= >= > operators you need to use strcol
> which take care of locale collation.
Which varstr_cmp() does, I guess. It's what textlt uses, for example.
> See unicode collation algorithm http://www.unicode.org/reports/tr10/
Wow, that looks like a fun read.
Best,
David
From | Date | Subject | |
---|---|---|---|
Next Message | Gregory Stark | 2008-07-07 20:59:16 | Re: PATCH: CITEXT 2.0 |
Previous Message | David E. Wheeler | 2008-07-07 20:13:15 | Re: PATCH: CITEXT 2.0 |