Quick Links

Re: PATCH: CITEXT 2.0

From:	Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>
To:	"David E(dot) Wheeler" <david(at)kineticode(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject:	Re: PATCH: CITEXT 2.0
Date:	2008-07-07 19:46:55
Message-ID:	487272AF.5070002@sun.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

David E. Wheeler napsal(a):
> On Jul 7, 2008, at 12:21, David E. Wheeler wrote:
>
>> My question is: why? Shouldn't they all use the same function for
>> comparison? I'm happy to dupe this implementation for citext, but I
>> don't understand it. Should not all comparisons be executed consistently?
>
> Let me try to answer my own question by citing this comment:
>
> /*
> * Since we only care about equality or not-equality, we can avoid
> all the
> * expense of strcoll() here, and just do bitwise comparison.
> */
>
> So, the upshot is that the = and <> operators are not locale-aware, yes?
> They just do byte comparisons. Is that really the way it should be? I
> mean, could there not be strings that are equivalent but have different
> bytes?

Correct. The problem is complex. It works fine only for normalized string. But
postgres now assume that all utf8 strings are normalized.

If you need to implement < <= >= > operators you need to use strcol which take
care of locale collation.

See unicode collation algorithm http://www.unicode.org/reports/tr10/

Zdenek

--
Zdenek Kotala Sun Microsystems
Prague, Czech Republic http://sun.com/postgresql

In response to

Re: PATCH: CITEXT 2.0 at 2008-07-07 19:26:05 from David E. Wheeler

Responses

Re: PATCH: CITEXT 2.0 at 2008-07-07 20:15:03 from David E. Wheeler

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Pavel Stehule	2008-07-07 19:48:50	Re: PATCH: CITEXT 2.0
Previous Message	David E. Wheeler	2008-07-07 19:38:07	Re: PATCH: CITEXT 2.0