Re: Why is an ISO-8859-8 database allowing values not within that set?

From: Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>
To: Herouth Maoz <herouth(at)unicell(dot)co(dot)il>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Why is an ISO-8859-8 database allowing values not within that set?
Date: 2012-07-22 12:07:53
Message-ID: 500BED19.1050208@ringerc.id.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 07/22/2012 03:58 PM, Herouth Maoz wrote:
> Thanks. That makes sense. The default client encoding on the reports
> database is ISO-8859-8, so I guess when I don't set it using
> \encoding, it does exactly what you say.
>
> OK, so I'm still looking for a way to convert illegal characters into
> something that won't collide with my encoding (asterisks or whatever).
>

As far as I know, PostgreSQL's encoding handling functions do not offer
substitution for unsupported characters, nor does the built-in
client<->server charset translation feature. You could do it with a
regular expression replacement of any character not in a class that
contains every char in valid in the target encoding. It feels like a
very clunky approach though.

An alternative is to use a procedural language that DOES support lossy
character encoding conversions. I don't think plpython does and plpgsql
certainly doesn't if PostgreSQL its self doesn't. I'd be amazed if
plperl didn't support lossy conversions, but I haven't done much with
Perl in years.

It'd be handy if Pg's client<->server conversion supported lossy
conversions for this kind of thing. Honestly I'm not sad it doesn't,
because it'd be something people would misuse to make the error messages
they didn't understand go away - then come back and complain that
PostgreSQL ate their data later.

--
Craig Ringer

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Berend Tober 2012-07-22 15:53:41 How to ;ist all table foreign key dependency relationships
Previous Message Herouth Maoz 2012-07-22 07:58:31 Re: Why is an ISO-8859-8 database allowing values not within that set?