Re: Import/Export failure due to UTF-8 error in pgAdmin4 but not in pgAdmin3

From: richard coleman <rcoleman(dot)ascentgl(at)gmail(dot)com>
To: Dave Page <dpage(at)pgadmin(dot)org>
Cc: Nanina Tron <nanina(dot)tron(at)icloud(dot)com>, "pgadmin-support lists(dot)postgresql(dot)org" <pgadmin-support(at)lists(dot)postgresql(dot)org>
Subject: Re: Import/Export failure due to UTF-8 error in pgAdmin4 but not in pgAdmin3
Date: 2019-01-07 17:59:48
Message-ID: CAGA3vBuJSQxxjNd4tkCqNyXhenw3jipR96AGoGUdHn57x8g4qA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgadmin-support

Dave,

I can't speak to Nania's specific issue, but I believe it's a
*pgAdmin4* specific
problem, at least in so far as SQL_ASCII is concerned. I say this because
I can usually work with the data just fine from the psql prompt, but
*not* through
pgAdmin4 (or other postgreSQL GUI's like dBeaver that rely on the JDBC
connection). .Net/Windows ODBC drivers and psql command prompt, no problem
(as was pgAdmin3 assuming you don't do too much with it beyond
select/update/insert). pgAdmin4, SELECT, export, etc. *BOOM*! At least
until you *cleaned * up the offending bytes.

Just my $0.02.

rik.

On Mon, Jan 7, 2019 at 12:49 PM Dave Page <dpage(at)pgadmin(dot)org> wrote:

> On Mon, Jan 7, 2019 at 9:11 PM richard coleman
> <rcoleman(dot)ascentgl(at)gmail(dot)com> wrote:
> >
> > Nania,
> >
> > Welcome to the wonderful world of pgAdmin4. I have been bitten often by
> this particular shortcoming in pgAdmin4. :( My issue seems to stems from
> the fact that I use ASCII tables as a back end for a .Net windows
> application and perfectly valid windows (Word/Excel) characters cause
> pgAdmin4 no end of issues.
>
> pgAdmin has nothing to do with this. It is simply calling PostgreSQL's
> psql utility, and telling it to import or export the file. The
> database server is then throwing the error seen.
>
> > My solution (with the help of some fine people on the postgres IRC
> channel) is to run a couple of functions on my tables/fields to locate and
> clean the offending characters out. Of course, if you need those
> characters, then this won't actually help. Here they are in the advent
> that they might prove helpful/adaptable to your situation.
>
> The problem with that is that you're trying to fix something that's
> basically broken to begin with. From the PostgreSQL docs
> (https://www.postgresql.org/docs/current/multibyte.html):
>
> ----
> The SQL_ASCII setting behaves considerably differently from the other
> settings. When the server character set is SQL_ASCII, the server
> interprets byte values 0-127 according to the ASCII standard, while
> byte values 128-255 are taken as uninterpreted characters. No encoding
> conversion will be done when the setting is SQL_ASCII. Thus, this
> setting is not so much a declaration that a specific encoding is in
> use, as a declaration of ignorance about the encoding. In most cases,
> if you are working with any non-ASCII data, it is unwise to use the
> SQL_ASCII setting because PostgreSQL will be unable to help you by
> converting or validating non-ASCII characters.
> ----
>
>
> > Finds what pgAdmin4 considers bad UTF8:
> > CREATE OR REPLACE FUNCTION live.is_utf8(
> > text)
> > RETURNS boolean
> > LANGUAGE 'sql'
> >
> > COST 100
> > VOLATILE
> > AS $BODY$
> > select encode(convert_to($1,'SQL_ASCII'),'hex')
> > ~ $r$(?x)
> > ^(?:(?:[0-7][0-9a-f])
> > |(?:(?:c[2-9a-f]|d[0-9a-f])
> > |e0[ab][0-9a-f]
> > |ed[89][0-9a-f]
> > |(?:(?:e[1-9abcef])
> > |f0[9ab][0-9a-f]
> > |f[1-3][89ab][0-9a-f]
> > |f48[0-9a-f]
> > )[89ab][0-9a-f]
> > )[89ab][0-9a-f]
> > )*$
> > $r$;
> > $BODY$;
> >
> > ALTER FUNCTION live.is_utf8(text)
> > OWNER TO postgres;
> >
> >
> > Fixes what pgAdmin4 considers to be bad UTF8:
> >>
> >> CREATE OR REPLACE FUNCTION live.badutf8(
> >> text)
> >> RETURNS text
> >> LANGUAGE 'sql'
> >> COST 100
> >> VOLATILE
> >> AS $BODY$
> >> select regexp_replace(encode(convert_to($1,'SQL_ASCII'),'hex'),
> >> $r$(?x)
> >> (?:(?:[0-7][0-9a-f])
> >> |(?:(?:c[2-9a-f]|d[0-9a-f])
> >> |e0[ab][0-9a-f]
> >> |ed[89][0-9a-f]
> >> |(?:(?:e[1-9abcef])
> >> |f0[9ab][0-9a-f]
> >> |f[1-3][89ab][0-9a-f]
> >> |f48[0-9a-f]
> >> )[89ab][0-9a-f]
> >> )[89ab][0-9a-f]
> >> )*(..)?
> >> $r$, '-\1-', 'g')
> >> $BODY$;
> >> ALTER FUNCTION live.badutf8(text)
> >> OWNER TO postgres;
> >
> >
> > Fixes bad UTF8
> >
> >
> >
> > On Mon, Jan 7, 2019 at 8:40 AM Nanina Tron <nanina(dot)tron(at)icloud(dot)com>
> wrote:
> >>
> >> Hi,
> >>
> >> I am pretty new to PostgreSQL so I might just miss something basic here.
> >>
> >> My problem is that, I cannot import or export some of the tables in my
> db with pgAdmin4, as it raises the “ERROR: unvalid byte-sequenz for coding
> »UTF8«: 0xdf 0x67“”. The table was originally created with Excel and
> imported via pgAdmin3. The strange thing is that it can still be imported
> and exported with pgAdmin3 but not with pgAdmin4. The db was created with
> encoding UTF-8, the .csv files where created with encoding UTF-8 and also
> the import/export dialog is set to UTF-8. Queries are also no problem on
> these tables so it seems to me that this could be a client problem.
> >>
> >> I am running PostgreSQL 11.1 on a server (I don’t know the OS,
> maintained with pgAdmin4). Locally I am working on a Windows 7 Professional
> (Service Pack 1) 64 Bit-System and pgAdmin4 3.6 & pgAdmin3.
> >>
> >> I did not find any hint of the same problem on my Google or archive
> search, so I would be very grateful for any idea what I am doing wrong here.
> >>
> >> Best,
> >>
> >> Nanina
>
>
>
> --
> Dave Page
> Blog: http://pgsnake.blogspot.com
> Twitter: @pgsnake
>
> EnterpriseDB UK: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

In response to

Responses

Browse pgadmin-support by date

  From Date Subject
Next Message Dave Page 2019-01-07 18:27:04 Re: Import/Export failure due to UTF-8 error in pgAdmin4 but not in pgAdmin3
Previous Message Dave Page 2019-01-07 17:49:03 Re: Import/Export failure due to UTF-8 error in pgAdmin4 but not in pgAdmin3