Re: client_encoding issue with SQL_ASCII on 8.3 to 10 upgrade

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Keith Fiske <keith(dot)fiske(at)crunchydata(dot)com>
Cc: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: client_encoding issue with SQL_ASCII on 8.3 to 10 upgrade
Date: 2018-04-16 16:09:09
Message-ID: 10581.1523894949@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Keith Fiske <keith(dot)fiske(at)crunchydata(dot)com> writes:
> Running into an issue with helping a client upgrade from 8.3 to 10 (yes, I
> know, please keep the out of support comments to a minimum, thanks :).

> The old database was in SQL_ASCII and it needs to stay that way for now
> unfortunately. The dump and restore itself works fine, but we're now
> running into issues with some data returning encoding errors unless we
> specifically set the client_encoding value to SQL_ASCII.

I'm guessing you might be hitting this 9.1 change:

* Have psql set the client encoding from the operating system locale
by default (Heikki Linnakangas)

This only happens if the PGCLIENTENCODING environment variable is
not set.

I think the previous default was to set client encoding equal to the
server encoding.

> Looking at the 8.3 database, it has the client_encoding value set to UTF8
> and queries seem to work fine. Is this just a bug in the old 8.3 not
> enforcing encoding properly?

Somewhere along the line we made SQL_ASCII -> something else conversions
check that the data was valid per the other encoding, even though no
actual data change happens.

> The other thing I noticed on the 10 instance was that, while the LOCALE was
> set to SQL_ASCII,

You mean encoding, I assume.

> the COLLATE and CTYPE values for the restored databases
> were en_US.UTF-8. Could this be having an affect?

This is not a great idea, no. You could be getting strange misbehaviors
in e.g. string comparison, because strcoll() will expect UTF8 data and
will likely not cope well with data that isn't valid in that encoding.

If you can't sanitize the encoding of your data, I'd suggest running
with lc_collate and lc_ctype set to "C".

regards, tom lane

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Keith Fiske 2018-04-16 16:16:48 Re: client_encoding issue with SQL_ASCII on 8.3 to 10 upgrade
Previous Message Adrian Klaver 2018-04-16 15:55:06 Re: client_encoding issue with SQL_ASCII on 8.3 to 10 upgrade