Quick Links

Re: Unicode Normalization

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	"David E(dot) Wheeler" <david(at)kineticode(dot)com>
Cc:	pg1(at)thetdh(dot)com, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Unicode Normalization
Date:	2009-09-24 15:59:09
Message-ID:	4ABB974D.5000104@dunslane.net
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

David E. Wheeler wrote:
> On Sep 24, 2009, at 6:24 AM, pg(at)thetdh(dot)com wrote:
>
>> In a context using normalization, wouldn't you typically want to
>> store a normalized-text type that could perhaps (depending on locale)
>> take advantage of simpler, more-efficient comparison functions?
>
> That might be nice, but I'd be wary of a geometric multiplication of
> text types. We already have TEXT and CITEXT; what if we had your NTEXT
> (normalized text) but I wanted it to also be case-insensitive?

Actually, I don't think it's necessarily a good idea at all. If a user
inputs a perfectly valid piece of UTF8 text, we should be able to give
it back to them exactly, whether or not it's in normalized form. The
normalized forms are useful for certain comparison purposes, but they
don't affect the validity of the text. CITEXT doesn't mangle what is
stored, just how it's compared.

cheers

andrew

In response to

Re: Unicode Normalization at 2009-09-24 15:36:37 from David E. Wheeler

Responses

Re: Unicode Normalization at 2009-09-24 16:05:58 from David E. Wheeler

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	David E. Wheeler	2009-09-24 16:05:58	Re: Unicode Normalization
Previous Message	David E. Wheeler	2009-09-24 15:36:37	Re: Unicode Normalization