Re: pg_dump seems to be broken in regards to the "--exclude-table-data" option on Windows.

From: tutiluren(at)tutanota(dot)com
To: Juan José Santamaría Flecha <juanjo(dot)santamaria(at)gmail(dot)com>
Cc: Pgsql Bugs <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: pg_dump seems to be broken in regards to the "--exclude-table-data" option on Windows.
Date: 2020-07-24 19:34:16
Message-ID: MD1JPJm--3-2@tutanota.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Jul 24, 2020, 12:30 PM by juanjo(dot)santamaria(at)gmail(dot)com:

>
> On Fri, Jul 24, 2020 at 8:14 AM David G. Johnston <> david(dot)g(dot)johnston(at)gmail(dot)com> > wrote:
>
>> On Thursday, July 23, 2020, <>> tutiluren(at)tutanota(dot)com>> > wrote:
>>
>>> After taking a longer break from my problem, I have now made a fresh, clean, scientifically conducted experiment in order to truly get to the bottom of this annoying problem once and for all.
>>>
>>> First, I set cmd.exe to use Unicode, just to be sure:
>>>
>>> C:\pg_dump_test>chcp 65001
>>> Active code page: 65001
>>>
>>> pg_dump: creating CONSTRAINT "Test schäma.Test täble Test täble_pkey"
>>>
>>> = WORKS. The dump was successful. (In spite of weird output chars.)
>>>
>>> C:\pg_dump_test>pg_dump --format plain --verbose --file "testdump.txt" --exclude-table-data="Test schäma.Test täble" --host="localhost" --port="5432" --username="postgres" --dbname="test"
>>> pg_dump: last built-in OID is 16383
>>> pg_dump: [archiver (db)] query failed: ERROR:  invalid byte sequence for encoding "UTF8": 0xe4 0x62 0x6c
>>> pg_dump: [archiver (db)] query was: SELECT c.oid
>>> FROM pg_catalog.pg_class c
>>> LEFT JOIN pg_catalog.pg_namespace n
>>> ON n.oid OPERATOR(pg_catalog.=) c.relnamespace
>>> WHERE c.relkind OPERATOR(pg_catalog.=) ANY
>>> (array['r', 'S', 'v', 'm', 'f', 'p'])
>>>   AND c.relname OPERATOR(pg_catalog.~) '^(test täble)$'
>>>   AND n.nspname OPERATOR(pg_catalog.~) '^(test schäma)$'
>>>
>>> = FAILED. The dump was aborted with these nonsensical errors.
>>>
>>> Finally, I tried the same command again like this:
>>>
>>> I looked everywhere for some kind of "client-encoding" option in the pg_dump manual, but there is no such thing. The only thing I can think of is that the client's encoding (that is, pg_dump) is for some reason not set to "UTF8" even though that's the encoding of the "test" database which I'm connecting to.
>>>
>>
>> The main thing here is this isn’t properly considered a bug in pg_dump but rather with most likely any client command run in the cmd.exe shell.
>>
>
> The problem is coming from CMD and its UTF8 support, but I guess I am not making a great job in convincing you. I will try to explain it through an example:
>
> 1. Start a CMD using "CMD /U", and set the code page to UTF8:
>
> C:\postgres>chcp 65001
> Active code page: 65001
>
> 2. Create a database named 'test', an schema named "Ö" and table named "Ä":
>
> C:\postgres>psql -c "create database test" postgres
> CREATE DATABASE
>
> C:\postgres>psql -c "\l test" postgres
>                                               List of databases
>  Name |    Owner     | Encoding |          Collate           |           Ctype            | Access privileges
> ------+--------------+----------+----------------------------+----------------------------+-------------------
>  test | postgres     | UTF8     | English_United States.1252 | English_United States.1252 |
> (1 row)
>
> C:\postgres>psql -c "create schema "Ö"" test
> CREATE SCHEMA
> C:\postgres>psql -c "create table "Ö"."Ä" (c1 char(1))" test
> CREATE TABLE
>
> 3. Test the table and check your client encoding:
>
> C:\postgres>psql -c "select * from "Ö"."Ä"" test
>  c1
> ----
> (0 rows)
>
> C:\postgres>psql -c "show client_encoding" test
>  client_encoding
> -----------------
>  WIN1252
> (1 row)
>
> 4. Change the client encoding to UTF8 and repeat the test:
>
> C:\postgres>set PGCLIENTENCODING=UTF8
>
> C:\postgres>psql -c "show client_encoding" test
>  client_encoding
> -----------------
>  UTF8
> (1 row)
>
> C:\postgres>psql -c "select * from "Ö"."Ä"" test
> 2020-07-24 11:25:01.710 CEST [1452] ERROR:  invalid byte sequence for encoding "UTF8": 0xd6 0x2e
> ERROR:  invalid byte sequence for encoding "UTF8": 0xd6 0x2e
>
> The reason why this happens is because the limited CMD's UTF8 support, 'chcp' only changes the code page of the displayed characters, shell commands, piping, redirection and most commands are still ANSI only.
>
> Does explicitly setting 'PGCLIENTENCODING=WIN1252' on your shell solves the issue?
>
> Regards,
>
> Juan José Santamaría Flecha
>
Thanks for putting in the effort to help.
Thanks for teaching me about the "cmd.exe /U" option; I never even knew that it existed and was always annoyed by how Unicode characters looked messed up even though Unicode worked (outside of this case).
As a matter of fact, 'PGCLIENTENCODING=WIN1252' causes it to stop failing and does dump the database. To make 100% sure that this was it, I set 'PGCLIENTENCODING=UTF8' immediately afterwards and repeated the command: failed. Then changed it back to 'PGCLIENTENCODING=WIN1252' once again and repeated: worked. So it's definitely due to this, which makes zero sense since the database *HAS* the encoding UTF8!
(Lots of too-early negative conclusions by me removed here.) It turns out that, using your =WIN1252 hack, and this specific version of the exclude syntax: --exclude-table-data="\"Test schäma\".\"Test täble\"" ... it works. It dumps correctly and ignores that specific table in that schema.
If anyone responsible for the documentation for PostgreSQL is reading this, please update it to make clear what syntax one is supposed to use for the confusing --exclude-table-data option!
Thanks for finding a workaround, but I sure don't like the fact that I have to lie to PG about the encoding to be able to dump my database. Who knows what kind of unintended consequences this might have? Will it dump the DB's contents incorrectly? Will it remove/garble any non-WIN1252-compatible data? Will it corrupt my backups? It feels horribly wrong to set the wrong encoding like this, when *everything* I do/use works in UTF-8...

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message David G. Johnston 2020-07-24 19:42:35 pg_dump seems to be broken in regards to the "--exclude-table-data" option on Windows.
Previous Message Michał Lis 2020-07-24 18:33:59 Re: BUG #16550: Problem with pg_service.conf