| From: | "David E(dot) Wheeler" <david(at)kineticode(dot)com> | 
|---|---|
| To: | pg1(at)thetdh(dot)com | 
| Cc: | "PG Hackers" <pgsql-hackers(at)postgresql(dot)org> | 
| Subject: | Re: Unicode Normalization | 
| Date: | 2009-09-24 15:36:37 | 
| Message-ID: | 9BD6C83B-018E-4263-9EC8-33344FEDF655@kineticode.com | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
On Sep 24, 2009, at 6:24 AM, pg(at)thetdh(dot)com wrote:
> In a context using normalization, wouldn't you typically want to  
> store a normalized-text type that could perhaps (depending on  
> locale) take advantage of simpler, more-efficient comparison  
> functions?
That might be nice, but I'd be wary of a geometric multiplication of  
text types. We already have TEXT and CITEXT; what if we had your NTEXT  
(normalized text) but I wanted it to also be case-insensitive?
> Whether you're doing INSERT/UPDATE, or importing a flat text file,  
> if you canonicalize characters and substrings of identical meaning  
> when trivial distinctions of encoding are irrelevant, you're better  
> off later.  User-invocable normalization functions by themselves  
> don't make much sense.
Well, they make sense because there's nothing else right now. It's an  
easy way to get some support in, and besides, it's mandated by the SQL  
standard.
> (If Postgres now supports binary- or mixed-binary-and-text flat  
> files, perhaps for restore purposes, the same thing applies.)
Don't follow this bit.
Best,
David
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Andrew Dunstan | 2009-09-24 15:59:09 | Re: Unicode Normalization | 
| Previous Message | Marko Tiikkaja | 2009-09-24 14:23:17 | Re: Using results from INSERT ... RETURNING |