Re: Import/Export failure due to UTF-8 error in pgAdmin4 but not in pgAdmin3

From: richard coleman <rcoleman(dot)ascentgl(at)gmail(dot)com>
To: Doug Easterbrook <doug(at)artsman(dot)com>
Cc: Dave Page <dpage(at)pgadmin(dot)org>, Nanina Tron <nanina(dot)tron(at)icloud(dot)com>, "pgadmin-support lists(dot)postgresql(dot)org" <pgadmin-support(at)lists(dot)postgresql(dot)org>
Subject: Re: Import/Export failure due to UTF-8 error in pgAdmin4 but not in pgAdmin3
Date: 2019-01-08 21:44:55
Message-ID: CAGA3vBvc=Ec_pjKSzr5UF+hwL-ScLOawpCpFc3EHbMUaTr+c7w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgadmin-support

Doug,

Hi. Is this a typo?

We’ve Postgres reject putting such stuff into the database using anything .

PostgreSQL allow *any text* into a column if the database has been set to
use SQL_ASCII as the encoding. This was the default encoding on Windows
machines, a long long time ago. All that setting the encoding to SQ:_ASCII
means is that postgreSQL *won't *do any translations between character
encoding for you. We are storing what I believe is Windows-1252 characters
in the database. The immediate problem (at least mine in so far as
pgAdmin4 is concerned) is that non-UTF8 characters (like smart quotes) are
handled *without* a problem by psql, .Net, and Windows ODBC (and under
pgAdmin3) but cause pgAdmin4 to crash/fail/thow errors. The only way to
get pgAdmin4 to behave is to locate and remove the offending characters.
My guess is that this is due to it's python underpinnings and the
expectation that the database would correctly translate the data into valid
UTF8. It is my belief that it should handle such *valid* postgreSQL data
better than it currently does. At least as well as it's predecessor
pgAdmin3 did.

Just my $0.02,

rik.

On Tue, Jan 8, 2019 at 3:56 PM Doug Easterbrook <doug(at)artsman(dot)com> wrote:

> https://www.cl.cam.ac.uk/~mgk25/ucs/quotes.html
>
> Smart quotes are not ascii. They are Unicode, strictly speaking .
>
> We’ve Postgres reject putting such stuff into the database using anything
> .
>
>
>
> Sent from my iPad
>
> On Jan 8, 2019, at 9:23 AM, richard coleman <rcoleman(dot)ascentgl(at)gmail(dot)com>
> wrote:
>
> Dave,
>
> Thanks for continuing this discussion, but I think you misunderstand the
> situation. I am storing *valid* *non-UTF8* data in a SQL_ASCII encoded
> postgreSQL database (please re-read what I had previously written). This is
> why psql has *NO* problem dealing with it. This is also why Windows ODBC
> and .Net applications have *NO* problem dealing with it. In fact the
> most common character that pgAdmin4 crashes on is the Windows *smart
> quote*. So to reiterate, I am using *valid **non-UTF8* characters in a
> SQL_ASCII database. This is a supported configuration for postgreSQL. The
> issue seems to be that pgAdmin4 is *assuming * UTF8 data and
> crashing/failing/throwing errors when it encounters invalid UTF8 characters.
>
> I hope I have made the situation a little bit clearer.
>
> Thanks again,
>
> rik.
>
> On Tue, Jan 8, 2019 at 12:29 AM Dave Page <dpage(at)pgadmin(dot)org> wrote:
>
>> Hi
>>
>> On Tue, Jan 8, 2019 at 12:47 AM richard coleman
>> <rcoleman(dot)ascentgl(at)gmail(dot)com> wrote:
>> >
>> > Dave,
>> >
>> > Thanks for taking the time to respond, but I don't see anywhere that
>> SQL_ASCII is recommended against doing. Here's the documentation listing
>> the supported encoding schemas:
>> https://www.postgresql.org/docs/current/multibyte.html .
>> >
>> > The only caveats listed for SQL_ASCII are:
>> >>
>> >> In most cases, if you are working with any non-ASCII data, it is
>> unwise to use the SQL_ASCII setting because PostgreSQL will be unable to
>> help you by converting or validating non-ASCII characters.
>>
>> You highlighted it below: "If the client character set is defined as
>> SQL_ASCII, encoding conversion is disabled, regardless of the server's
>> character set. Just as for the server, use of SQL_ASCII is unwise
>> unless you are working with all-ASCII data"
>>
>> You're using UTF-8 data, not ASCII, which it says is unwise because
>> conversion won't take place (and consequently, neither will
>> validation). I don't see how one could read that and not take it as
>>
>> You are running into exactly that problem; and it's visible when
>> working with technologies that are strict about following encoding
>> rules - in this case, psql when pgAdmin shells out to it.
>>
>> I did think of one possible quick fix this morning which I'll look
>> into, but as I noted before; it's a workaround, and the real problem
>> is storing un-validated UTF-8 data in a SQL_ASCII database.
>>
>> > Or, a reminder that postgreSQL can't help with any conversions you
>> might want to do.
>> >
>> > Then there's this:
>> >>
>> >> PostgreSQL will allow superusers to create databases with SQL_ASCII
>> encoding even when LC_CTYPE is not C or POSIX. As noted above, SQL_ASCII
>> does not enforce that the data stored in the database has any particular
>> encoding, and so this choice poses risks of locale-dependent misbehavior.
>> Using this combination of settings is deprecated and may someday be
>> forbidden altogether.
>> >
>> >
>> > A note that you can currently choose incompatible settings, but
>> probably can't in the future.
>> >
>> > And finally there's this bit of advice:
>> >>
>> >> If the client character set is defined as SQL_ASCII, encoding
>> conversion is disabled, regardless of the server's character set. Just as
>> for the server, use of SQL_ASCII is unwise unless you are working with
>> all-ASCII data[emphasis mine].
>> >
>> >
>> > Which is just a reiteration of the first caveat, that if you are using
>> SQL_ASCII the database won't perform any conversions on your behalf.
>> >
>> > That is hardly a recommendation against using that supported encoding
>> scheme. The fact that the psql command prompt, among others, works with it
>> without issue, is an indication that the problem lies in pgAdmin4 (and I
>> would guess the reliance of python on UTF8) than an issue with the database
>> itself. pgAdmin4 needs to check for and more gracefully handle valid
>> postgreSQL data that might happen to be not UTF8 compliant.
>> >
>> > Until then, I will have to periodically scan and clean for bad UTF8
>> data to keep pgAdmin4 (and other JDBC dependent code) happy. The legacy
>> enterprise .Net applications that depend on it prohibit converting it to
>> UTF8 (or anything else for that matter).
>> >
>> > Just my $0.02,
>> >
>> > rik.
>> >
>> >
>> > On Mon, Jan 7, 2019 at 1:27 PM Dave Page <dpage(at)pgadmin(dot)org> wrote:
>> >>
>> >> Hi
>> >>
>> >> On Mon, Jan 7, 2019 at 11:30 PM richard coleman
>> >> <rcoleman(dot)ascentgl(at)gmail(dot)com> wrote:
>> >> >
>> >> > Dave,
>> >> >
>> >> > I can't speak to Nania's specific issue, but I believe it's a
>> pgAdmin4 specific problem, at least in so far as SQL_ASCII is concerned. I
>> say this because I can usually work with the data just fine from the psql
>> prompt, but not through pgAdmin4 (or other postgreSQL GUI's like dBeaver
>> that rely on the JDBC connection). .Net/Windows ODBC drivers and psql
>> command prompt, no problem (as was pgAdmin3 assuming you don't do too much
>> with it beyond select/update/insert). pgAdmin4, SELECT, export, etc. BOOM!
>> At least until you cleaned up the offending bytes.
>> >> >
>> >> > Just my $0.02.
>> >>
>> >> I'm afraid the fundamental problem is that you're using PostgreSQL in
>> >> a way that the docs specifically recommend against doing, and you're
>> >> seeing the reason why.
>> >>
>> >> pgAdmin 3 and 4 are completely different. In the import/export utility
>> >> that Nania reported the issue in, pgAdmin doesn't look at the data *at
>> >> all*. It simply executes \copy in psql, which does all the work. All
>> >> pgAdmin does is provide connection info and options to psql, based on
>> >> the selections made in the import/export dialogue, and executes it.
>> >>
>> >> In other areas of pgAdmin, like the query tool, it is possible to see
>> >> similar issues with the same underlying cause, though we've spent a
>> >> significant amount of time trying to work around all the possible edge
>> >> cases.
>> >>
>> >> pgAdmin 3 implemented import/export itself, using underlying libraries
>> >> that were far less strict about encoding rules than Python is. That
>> >> may have been more convenient for this particular issue, but it's a
>> >> lot worse in many others.
>> >>
>> >> As a general thought (and do bear in mind, we've spent significant
>> >> time and resources on these issues in the past), I'd far rather spend
>> >> time on new features and actual bugs, than further time on workarounds
>> >> for things the PostgreSQL docs specifically advise against doing.
>> >>
>> >> --
>> >> Dave Page
>> >> Blog: http://pgsnake.blogspot.com
>> >> Twitter: @pgsnake
>> >>
>> >> EnterpriseDB UK: http://www.enterprisedb.com
>> >> The Enterprise PostgreSQL Company
>>
>>
>>
>> --
>> Dave Page
>> Blog: http://pgsnake.blogspot.com
>> Twitter: @pgsnake
>>
>> EnterpriseDB UK: http://www.enterprisedb.com
>> The Enterprise PostgreSQL Company
>>
>

In response to

Responses

Browse pgadmin-support by date

  From Date Subject
Next Message Doug Easterbrook 2019-01-08 22:18:53 Re: Import/Export failure due to UTF-8 error in pgAdmin4 but not in pgAdmin3
Previous Message Doug Easterbrook 2019-01-08 20:56:52 Re: Import/Export failure due to UTF-8 error in pgAdmin4 but not in pgAdmin3