Re: Import/Export failure due to UTF-8 error in pgAdmin4 but not in pgAdmin3

From: richard coleman <rcoleman(dot)ascentgl(at)gmail(dot)com>
To: Dave Page <dpage(at)pgadmin(dot)org>
Cc: Nanina Tron <nanina(dot)tron(at)icloud(dot)com>, "pgadmin-support lists(dot)postgresql(dot)org" <pgadmin-support(at)lists(dot)postgresql(dot)org>
Subject: Re: Import/Export failure due to UTF-8 error in pgAdmin4 but not in pgAdmin3
Date: 2019-01-08 13:23:16
Message-ID: CAGA3vBvG2_W2GDKPRjuXeHhcvf1Ct-R-Zrf=YTDv3Y+qGDZqoQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgadmin-support

Dave,

Thanks for continuing this discussion, but I think you misunderstand the
situation. I am storing *valid* *non-UTF8* data in a SQL_ASCII encoded
postgreSQL database (please re-read what I had previously written). This is
why psql has *NO* problem dealing with it. This is also why Windows ODBC
and .Net applications have *NO* problem dealing with it. In fact the most
common character that pgAdmin4 crashes on is the Windows *smart quote*. So
to reiterate, I am using *valid **non-UTF8* characters in a SQL_ASCII
database. This is a supported configuration for postgreSQL. The issue
seems to be that pgAdmin4 is *assuming * UTF8 data and
crashing/failing/throwing errors when it encounters invalid UTF8 characters.

I hope I have made the situation a little bit clearer.

Thanks again,

rik.

On Tue, Jan 8, 2019 at 12:29 AM Dave Page <dpage(at)pgadmin(dot)org> wrote:

> Hi
>
> On Tue, Jan 8, 2019 at 12:47 AM richard coleman
> <rcoleman(dot)ascentgl(at)gmail(dot)com> wrote:
> >
> > Dave,
> >
> > Thanks for taking the time to respond, but I don't see anywhere that
> SQL_ASCII is recommended against doing. Here's the documentation listing
> the supported encoding schemas:
> https://www.postgresql.org/docs/current/multibyte.html .
> >
> > The only caveats listed for SQL_ASCII are:
> >>
> >> In most cases, if you are working with any non-ASCII data, it is unwise
> to use the SQL_ASCII setting because PostgreSQL will be unable to help you
> by converting or validating non-ASCII characters.
>
> You highlighted it below: "If the client character set is defined as
> SQL_ASCII, encoding conversion is disabled, regardless of the server's
> character set. Just as for the server, use of SQL_ASCII is unwise
> unless you are working with all-ASCII data"
>
> You're using UTF-8 data, not ASCII, which it says is unwise because
> conversion won't take place (and consequently, neither will
> validation). I don't see how one could read that and not take it as
>
> You are running into exactly that problem; and it's visible when
> working with technologies that are strict about following encoding
> rules - in this case, psql when pgAdmin shells out to it.
>
> I did think of one possible quick fix this morning which I'll look
> into, but as I noted before; it's a workaround, and the real problem
> is storing un-validated UTF-8 data in a SQL_ASCII database.
>
> > Or, a reminder that postgreSQL can't help with any conversions you might
> want to do.
> >
> > Then there's this:
> >>
> >> PostgreSQL will allow superusers to create databases with SQL_ASCII
> encoding even when LC_CTYPE is not C or POSIX. As noted above, SQL_ASCII
> does not enforce that the data stored in the database has any particular
> encoding, and so this choice poses risks of locale-dependent misbehavior.
> Using this combination of settings is deprecated and may someday be
> forbidden altogether.
> >
> >
> > A note that you can currently choose incompatible settings, but probably
> can't in the future.
> >
> > And finally there's this bit of advice:
> >>
> >> If the client character set is defined as SQL_ASCII, encoding
> conversion is disabled, regardless of the server's character set. Just as
> for the server, use of SQL_ASCII is unwise unless you are working with
> all-ASCII data[emphasis mine].
> >
> >
> > Which is just a reiteration of the first caveat, that if you are using
> SQL_ASCII the database won't perform any conversions on your behalf.
> >
> > That is hardly a recommendation against using that supported encoding
> scheme. The fact that the psql command prompt, among others, works with it
> without issue, is an indication that the problem lies in pgAdmin4 (and I
> would guess the reliance of python on UTF8) than an issue with the database
> itself. pgAdmin4 needs to check for and more gracefully handle valid
> postgreSQL data that might happen to be not UTF8 compliant.
> >
> > Until then, I will have to periodically scan and clean for bad UTF8 data
> to keep pgAdmin4 (and other JDBC dependent code) happy. The legacy
> enterprise .Net applications that depend on it prohibit converting it to
> UTF8 (or anything else for that matter).
> >
> > Just my $0.02,
> >
> > rik.
> >
> >
> > On Mon, Jan 7, 2019 at 1:27 PM Dave Page <dpage(at)pgadmin(dot)org> wrote:
> >>
> >> Hi
> >>
> >> On Mon, Jan 7, 2019 at 11:30 PM richard coleman
> >> <rcoleman(dot)ascentgl(at)gmail(dot)com> wrote:
> >> >
> >> > Dave,
> >> >
> >> > I can't speak to Nania's specific issue, but I believe it's a
> pgAdmin4 specific problem, at least in so far as SQL_ASCII is concerned. I
> say this because I can usually work with the data just fine from the psql
> prompt, but not through pgAdmin4 (or other postgreSQL GUI's like dBeaver
> that rely on the JDBC connection). .Net/Windows ODBC drivers and psql
> command prompt, no problem (as was pgAdmin3 assuming you don't do too much
> with it beyond select/update/insert). pgAdmin4, SELECT, export, etc. BOOM!
> At least until you cleaned up the offending bytes.
> >> >
> >> > Just my $0.02.
> >>
> >> I'm afraid the fundamental problem is that you're using PostgreSQL in
> >> a way that the docs specifically recommend against doing, and you're
> >> seeing the reason why.
> >>
> >> pgAdmin 3 and 4 are completely different. In the import/export utility
> >> that Nania reported the issue in, pgAdmin doesn't look at the data *at
> >> all*. It simply executes \copy in psql, which does all the work. All
> >> pgAdmin does is provide connection info and options to psql, based on
> >> the selections made in the import/export dialogue, and executes it.
> >>
> >> In other areas of pgAdmin, like the query tool, it is possible to see
> >> similar issues with the same underlying cause, though we've spent a
> >> significant amount of time trying to work around all the possible edge
> >> cases.
> >>
> >> pgAdmin 3 implemented import/export itself, using underlying libraries
> >> that were far less strict about encoding rules than Python is. That
> >> may have been more convenient for this particular issue, but it's a
> >> lot worse in many others.
> >>
> >> As a general thought (and do bear in mind, we've spent significant
> >> time and resources on these issues in the past), I'd far rather spend
> >> time on new features and actual bugs, than further time on workarounds
> >> for things the PostgreSQL docs specifically advise against doing.
> >>
> >> --
> >> Dave Page
> >> Blog: http://pgsnake.blogspot.com
> >> Twitter: @pgsnake
> >>
> >> EnterpriseDB UK: http://www.enterprisedb.com
> >> The Enterprise PostgreSQL Company
>
>
>
> --
> Dave Page
> Blog: http://pgsnake.blogspot.com
> Twitter: @pgsnake
>
> EnterpriseDB UK: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

In response to

Responses

Browse pgadmin-support by date

  From Date Subject
Next Message Dave Page 2019-01-08 13:37:43 Re: Import/Export failure due to UTF-8 error in pgAdmin4 but not in pgAdmin3
Previous Message Khushboo Vashi 2019-01-08 10:35:26 Re: "Statistics" tab is very slow