Re: 8.0, UTF8, and CLIENT_ENCODING

From: Paul Ramsey <pramsey(at)refractions(dot)net>
To: PostgreSQL <pgsql-general(at)postgresql(dot)org>
Subject: Re: 8.0, UTF8, and CLIENT_ENCODING
Date: 2007-05-17 23:55:51
Message-ID: D84BEF92-179D-4197-A686-FA80DA8B7961@refractions.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Thanks all for the information. Summary is:

- 8.0 wasn't very strict, and allowed the illegal values in, instead
of mapping them over into UTF-8 space
- the values can be stripped with iconv -c
- 8.2 should be more strict

I'm in the midst of my upgrade to 8.2 now, hopefully the LATIN1->UTF8
conversion will now map the odd characters cleanly into UTF space.

On 17-May-07, at 3:25 PM, Michael Glaesemann wrote:

>
> On May 17, 2007, at 16:47 , PFC wrote:
>
>>> and put that in the form. Instead of being mapped to 2-byte UTF8
>>> high-bit equivalents, they are going into the database directly
>>> as one-byte values > 127. That is, as illegal UTF8 values.
>>
>> Sometimes you also get HTML entities in the mix. Who knows.
>> All my web forms are UTF-8 back to back, it just works. Was I
>> lucky ?
>> Normally postgres rejects illegal UTF8 values, you wouldn't be
>> able to insert them...
>
> 8.0 and earlier weren't quite as strict as it should have been. See
> the note at the end of the migration instuctions in the release
> notes for 8.1[1] That may have been part of the issue here.
>
> Michael Glaesemann
> grzm seespotcode net
>
> [1](http://www.postgresql.org/docs/8.2/interactive/
> release-8-1.html#AEN80196)

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Michael Nolan 2007-05-18 00:02:56 Re: Large Database Restore
Previous Message George Pavlov 2007-05-17 23:45:30 Re: Privs on deleted objects