Re: client_encoding issue with SQL_ASCII on 8.3 to 10 upgrade

From: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
To: Keith Fiske <keith(dot)fiske(at)crunchydata(dot)com>, pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: client_encoding issue with SQL_ASCII on 8.3 to 10 upgrade
Date: 2018-04-16 15:55:06
Message-ID: d12e4b06-756e-5a61-f6d0-97d9cfeca991@aklaver.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 04/16/2018 08:16 AM, Keith Fiske wrote:
> Running into an issue with helping a client upgrade from 8.3 to 10 (yes,
> I know, please keep the out of support comments to a minimum, thanks :).
>
> The old database was in SQL_ASCII and it needs to stay that way for now
> unfortunately. The dump and restore itself works fine, but we're now
> running into issues with some data returning encoding errors unless we
> specifically set the client_encoding value to SQL_ASCII.
>
> Looking at the 8.3 database, it has the client_encoding value set to
> UTF8 and queries seem to work fine. Is this just a bug in the old 8.3
> not enforcing encoding properly?e

AFAIK, SQL_ASCII basically means no encoding:

https://www.postgresql.org/docs/10/static/multibyte.html

"The SQL_ASCII setting behaves considerably differently from the other
settings. When the server character set is SQL_ASCII, the server
interprets byte values 0-127 according to the ASCII standard, while byte
values 128-255 are taken as uninterpreted characters. No encoding
conversion will be done when the setting is SQL_ASCII. Thus, this
setting is not so much a declaration that a specific encoding is in use,
as a declaration of ignorance about the encoding. In most cases, if you
are working with any non-ASCII data, it is unwise to use the SQL_ASCII
setting because PostgreSQL will be unable to help you by converting or
validating non-ASCII characters."

What client are you working with?

If psql then its behavior has changed between 8.3 and 10:

https://www.postgresql.org/docs/10/static/release-9-1.html#id-1.11.6.121.3

"

Have psql set the client encoding from the operating system locale by
default (Heikki Linnakangas)

This only happens if the PGCLIENTENCODING environment variable is not set.
"

https://www.postgresql.org/docs/10/static/app-psql.html

"If both standard input and standard output are a terminal, then psql
sets the client encoding to “auto”, which will detect the appropriate
client encoding from the locale settings (LC_CTYPE environment variable
on Unix systems). If this doesn't work out as expected, the client
encoding can be overridden using the environment variable PGCLIENTENCODING."

>
> The other thing I noticed on the 10 instance was that, while the LOCALE
> was set to SQL_ASCII, the COLLATE and CTYPE values for the restored
> databases were en_US.UTF-8. Could this be having an affect? Is there any
> way to see what these values were on the old 8.3 database? The
> pg_database catalog does not have these values stored back then.
>
> --
> Keith Fiske
> Senior Database Engineer
> Crunchy Data - http://crunchydata.com

--
Adrian Klaver
adrian(dot)klaver(at)aklaver(dot)com

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2018-04-16 16:09:09 Re: client_encoding issue with SQL_ASCII on 8.3 to 10 upgrade
Previous Message Adrian Klaver 2018-04-16 15:37:42 Re: client_encoding issue with SQL_ASCII on 8.3 to 10 upgrade