Quick Links

Re: Unicode normalization

From:	Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
To:	Sam Mason <sam(at)samason(dot)me(dot)uk>
Cc:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Unicode normalization
Date:	2009-09-16 23:01:40
Message-ID:	dcc563d10909161601v1991b152q6bd2c2a829a511dd@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

On Wed, Sep 16, 2009 at 4:42 PM, Sam Mason <sam(at)samason(dot)me(dot)uk> wrote:
> On Wed, Sep 16, 2009 at 09:35:02PM +0200, Andreas Kalsch wrote:
>> CREATE OR REPLACE FUNCTION test (str text)
>> RETURNS text
>> AS $$
>> import unicodedata
>> return unicodedata.normalize('NFKD', str.decode('UTF-8'))
>> $$ LANGUAGE plpythonu;
>
> I'd guess you want that to be:
>
> return unicodedata.normalize('NFKD', str.decode('UTF-8')).encode('UTF-8');
>
> If you're converting from a utf8 encoding, you probably need to go
> back again! This could certainly be made easier though, PG knows what
> encoding its strings are stored in, why doesn't it work with unicode
> strings by default?

Isn't it python that's making the mistake here, not pg?

In response to

Re: Unicode normalization at 2009-09-16 22:42:58 from Sam Mason

Browse pgsql-general by date

	From	Date	Subject
Next Message	Andreas Kalsch	2009-09-16 23:37:47	How to simplify unicode strings
Previous Message	Nathan Widmyer	2009-09-16 22:58:39	Re: Current state of XML capabilities in PostgreSQL?