Re: client_encoding issue with SQL_ASCII on 8.3 to 10 upgrade

From: Keith Fiske <keith(dot)fiske(at)crunchydata(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: client_encoding issue with SQL_ASCII on 8.3 to 10 upgrade
Date: 2018-04-16 16:16:48
Message-ID: CAODZiv75hfwLbfH0BWKFdq6c5M5NjyhB+oexLWPDvpJuwBLmeg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Mon, Apr 16, 2018 at 12:09 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Keith Fiske <keith(dot)fiske(at)crunchydata(dot)com> writes:
> > Running into an issue with helping a client upgrade from 8.3 to 10 (yes,
> I
> > know, please keep the out of support comments to a minimum, thanks :).
>
> > The old database was in SQL_ASCII and it needs to stay that way for now
> > unfortunately. The dump and restore itself works fine, but we're now
> > running into issues with some data returning encoding errors unless we
> > specifically set the client_encoding value to SQL_ASCII.
>
> I'm guessing you might be hitting this 9.1 change:
>
> * Have psql set the client encoding from the operating system locale
> by default (Heikki Linnakangas)
>
> This only happens if the PGCLIENTENCODING environment variable is
> not set.
>
> I think the previous default was to set client encoding equal to the
> server encoding.
>
> > Looking at the 8.3 database, it has the client_encoding value set to UTF8
> > and queries seem to work fine. Is this just a bug in the old 8.3 not
> > enforcing encoding properly?
>
> Somewhere along the line we made SQL_ASCII -> something else conversions
> check that the data was valid per the other encoding, even though no
> actual data change happens.
>
> > The other thing I noticed on the 10 instance was that, while the LOCALE
> was
> > set to SQL_ASCII,
>
> You mean encoding, I assume.
>
> > the COLLATE and CTYPE values for the restored databases
> > were en_US.UTF-8. Could this be having an affect?
>
> This is not a great idea, no. You could be getting strange misbehaviors
> in e.g. string comparison, because strcoll() will expect UTF8 data and
> will likely not cope well with data that isn't valid in that encoding.
>
> If you can't sanitize the encoding of your data, I'd suggest running
> with lc_collate and lc_ctype set to "C".
>
> regards, tom lane
>

Thanks to both of you Adrian & Tom.

It is the 9.1 change to the psql client that seems to be causing this.

And pg_controldata was able to show that the CTYPE and COLLATE were UTF8 on
the old system. If that's the case, do you still think it's a good idea to
set the COLLATE and CTYPE to "C"?

--
Keith Fiske
Senior Database Engineer
Crunchy Data - http://crunchydata.com

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2018-04-16 16:21:48 Re: client_encoding issue with SQL_ASCII on 8.3 to 10 upgrade
Previous Message Tom Lane 2018-04-16 16:09:09 Re: client_encoding issue with SQL_ASCII on 8.3 to 10 upgrade