Re: [External] postgres 9.5 DB corruption: invalid byte sequence for encoding "UTF8"

From: Thomas Tignor <tptignor(at)yahoo(dot)com>
To: Brad Nicholson <bradn(at)ca(dot)ibm(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "pgsql-general(at)lists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org>, Vijaykumar Jain <vjain(at)opentable(dot)com>
Subject: Re: [External] postgres 9.5 DB corruption: invalid byte sequence for encoding "UTF8"
Date: 2019-03-26 18:06:15
Message-ID: 1013768132.11403739.1553623575830@mail.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hello again Brad, and hello Tom. Thanks for writing, and Tom I'll add some extra thanks for the many google searches over the years which have been answered by your posts. :)
It seems there's been some confusion on the data flow so I'll try to explain. The original "source" I referred to was our JVM generating alert data and performing DML on the DB via the v42.0.0 driver. I do not know of any encoding setting for the driver itself.
The driver writes directly to our primary (Slony-I master) DB. Changes are then replicated by our replica (Slony-I subscriber) DBs. Each of these DBs has a "UTF8" server encoding. COPY out ops (performed by Slony-I or anything else) always work fine, and COPY in ops fail if there is data corruption.

ams(at)ams3(dot)dfw(dot)netmgmt:/a/ams/lib/ext/database$ sum postgresql-42.0.0.jar 

12476   691

ams(at)ams3(dot)dfw(dot)netmgmt:/a/ams/lib/ext/database$ 

ams(at)ams3(dot)dfw(dot)netmgmt:~$ echo ---- Slony-I Master ----

---- Slony-I Master ----

ams(at)ams3(dot)dfw(dot)netmgmt:~$ psql -U akamai -d ams -c "SELECT datname, pg_encoding_to_char(encoding) FROM pg_database WHERE datname = 'ams'"

 datname | pg_encoding_to_char 

---------+---------------------

 ams     | UTF8

(1 row)

ams(at)ams3(dot)dfw(dot)netmgmt:~$ 

root(at)ams-repl7(dot)sjc(dot)netmgmt:~# echo ---- Slony-I Replica ----

---- Slony-I Replica ----

root(at)ams-repl7(dot)sjc(dot)netmgmt:~# psql -U akamai -d ams -c "SELECT datname, pg_encoding_to_char(encoding) FROM pg_database WHERE datname = 'ams'"

 datname | pg_encoding_to_char 

---------+---------------------

 ams     | UTF8

(1 row)

root(at)ams-repl7(dot)sjc(dot)netmgmt:~# 

Tom    :-)

On Tuesday, March 26, 2019, 10:14:59 AM EDT, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

"Brad Nicholson" <bradn(at)ca(dot)ibm(dot)com> writes:
> Thomas Tignor <tptignor(at)yahoo(dot)com> wrote on 03/25/2019 08:25:49 PM:
>> Thanks for writing. As I mentioned to Vijay, the "source" is a JVM
>> using the postgres v42.0.0 JDBC driver. I do not believe we have any
>> explicit encoding set, and so I expect the client encoding is
>> SQL_ASCII. The DB is most definitely UTF8.

> These statements are contradictory.

> The value of client_encoding from your select on pg_settings is SQL_ASCII.

> The docs clearly state:
> https://www.postgresql.org/docs/9.5/runtime-config-client.html
> "Sets the client-side encoding (character set). The default is to use the
> database encoding. "

> If you don't have client_encoding explicitly, then it is using the database
> encoding.

Umm ... not necessarily.  That bit in runtime-config-client.html
correctly states what the *server's* default for client_encoding is,
but an awful lot of client-side code will immediately override that.
psql will try to set it based on its LANG/LC_CTYPE environment,
for example.  I don't know what JDBC does; it might be different.

But in any case, yes, it'd be more reliable to check pg_database.encoding
explicitly.

            regards, tom lane

In response to

Browse pgsql-general by date

  From Date Subject
Next Message David Rowley 2019-03-27 02:16:43 Re: Column lookup in a row performance
Previous Message Adrian Klaver 2019-03-26 17:55:57 Re: plctl extension issue postgresql 11.2