Re: regexp_replace and UTF8

From: Harald Fuchs <hari(dot)fuchs(at)gmail(dot)com>
To: pgsql-sql(at)postgresql(dot)org
Subject: Re: regexp_replace and UTF8
Date: 2009-01-30 15:47:04
Message-ID: pufxj0g5jb.fsf@srv.protecting.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

In article <87ljstm4eq(dot)fsf(at)oxford(dot)xeocode(dot)com>,
Gregory Stark <stark(at)enterprisedb(dot)com> writes:

> "Bart Degryse" <Bart(dot)Degryse(at)indicator(dot)be> writes:
>> Hi,
>> I have a text field with data like this: 'de pati&#235;nt niet'

>> Can anyone help me fix this or point me to a better approach.
>> By the way, changing the way data is put into the field is
>> unfortunately not an option.

> You could use a plperl function to use one of the many html parsing perl
> modules?

Yes, either plperl or some external HTML tool.

>> Basically what I need to do (I think) is
>> - get rid of the &, # and ;
>> - convert the number to hex
>> - make a UTF8 from that (thus: \xEB)
>> - convert that to SQL_ASCII

You know that SQL_ASCII is a misnomer for "no encoding at all, and I
don't care"? I'd use UTF8 or (if you stay in Western Europe) Latin9.

In response to

Browse pgsql-sql by date

  From Date Subject
Next Message Craig Ringer 2009-01-30 17:00:50 Re: dynamic OUT parameters?
Previous Message Gregory Stark 2009-01-30 11:14:53 Re: regexp_replace and UTF8