Quick Links

Re: PATCH: CITEXT 2.0

From:	"David E(dot) Wheeler" <david(at)kineticode(dot)com>
To:	Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>
Cc:	pgsql-hackers(at)postgresql(dot)org, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject:	Re: PATCH: CITEXT 2.0
Date:	2008-07-07 20:15:03
Message-ID:	8E2D49F2-E366-4504-9428-0AB6F35468FA@kineticode.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Jul 7, 2008, at 12:46, Zdenek Kotala wrote:

>> So, the upshot is that the = and <> operators are not locale-aware,
>> yes? They just do byte comparisons. Is that really the way it
>> should be? I mean, could there not be strings that are equivalent
>> but have different bytes?
>
> Correct. The problem is complex. It works fine only for normalized
> string. But postgres now assume that all utf8 strings are normalized.

I see. So binary equivalence is okay, in that case.

> If you need to implement < <= >= > operators you need to use strcol
> which take care of locale collation.

Which varstr_cmp() does, I guess. It's what textlt uses, for example.

> See unicode collation algorithm http://www.unicode.org/reports/tr10/

Wow, that looks like a fun read.

Best,

David

In response to

Re: PATCH: CITEXT 2.0 at 2008-07-07 19:46:55 from Zdenek Kotala

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Gregory Stark	2008-07-07 20:59:16	Re: PATCH: CITEXT 2.0
Previous Message	David E. Wheeler	2008-07-07 20:13:15	Re: PATCH: CITEXT 2.0