Re: Unicode normalization

From: Sam Mason <sam(at)samason(dot)me(dot)uk>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Unicode normalization
Date: 2009-09-16 22:42:58
Message-ID: 20090916224258.GM5407@samason.me.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Wed, Sep 16, 2009 at 09:35:02PM +0200, Andreas Kalsch wrote:
> CREATE OR REPLACE FUNCTION test (str text)
> RETURNS text
> AS $$
> import unicodedata
> return unicodedata.normalize('NFKD', str.decode('UTF-8'))
> $$ LANGUAGE plpythonu;

I'd guess you want that to be:

return unicodedata.normalize('NFKD', str.decode('UTF-8')).encode('UTF-8');

If you're converting from a utf8 encoding, you probably need to go
back again! This could certainly be made easier though, PG knows what
encoding its strings are stored in, why doesn't it work with unicode
strings by default?

--
Sam http://samason.me.uk/

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message sulmansarwar 2009-09-16 22:56:15 Segmentation fault during restoration of compressed(.gz) database
Previous Message Sulman Sarwar 2009-09-16 22:32:02 Segmentation Fault during database restoration