pg_dump's "--exclude-table" and "--exclude-table-data" options are ignored and/or cause the dump to fail entirely unless both the schema and table name use 1950s-era identifiers.

From: tutiluren(at)tutanota(dot)com
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: pg_dump's "--exclude-table" and "--exclude-table-data" options are ignored and/or cause the dump to fail entirely unless both the schema and table name use 1950s-era identifiers.
Date: 2020-07-21 06:30:05
Message-ID: MCk38UV--3-2@tutanota.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

After wasting yet another full day on this what should be a non-problem, I have come to the equally sad and maddening conclusion that the following crucial features of `pg_dump` are utterly broken, possibly "only" on Windows (only tested there):

--exclude-table
--exclude-table-data

The problem: the options *refuse* to work if your schema or table name contains any non-lowercase, non-ASII character. I've tested it heavily today (with PostgreSQL 11.7 on Windows 10). I'm 99% sure by now that I'm not making some kind of "obvious mistake", including setting the encoding in every possible manner...

Pardon me if it's actually been fixed recently, but it seems **extremely** unlikely. I couldn't find any mention of anything related to "--exclude-table" having been fixed in any changelog.

PostGIS did the same thing: it stubbornly ignored PostgreSQL's own rules of how you can name a schema/table, eventually bullying me into renaming its schema "postgis" because I *needed* it to work, which now looks awfully inconsistent with my other, properly named schemas.

I *need* this to work as well, because I can't keep making full backups of gigantic tables full of longterm-useless debug data. However, I will not let `pg_dump` bully me into renaming my entire database structure. I'm never, ever going through that again in my life. Plus I'm actually following its long-established rules, so I objectively am not doing anything wrong -- `pg_dump` is. Yet I'm the one who has to suffer... And yes, I know that it's volunteers and so on. I've paid to the PostgreSQL project with a large part of my life instead of money, with endless nights struggling with things like this.

PostGIS is a third-party(-ish) extension, so it's *slightly* more understandable there. `pg_dump`, on the other hand, is the **official tool for backing up a PostgreSQL database**, yet it still somehow doesn't understand PostgreSQL's own syntax! The PostgreSQL manual very clearly states that you can name your schemas and tables to (almost) anything you want, as long as you double-quote them when referring to them, as expected/natural.

Only its own `pg_dump` tool doesn't follow this rule. It demands -- *assumes*! -- that everything is in all-lowercase, only ASCII, with no spaces. It doesn't matter if you add quotes, which always works in PG SQL queries and which is just "how you do it".

Try it out yourself, by creating a test schema called "Personal stöff" and a table in it called "My däiary". Then create a text column and make it PK and then add the text "This is supposed to be ignored.". Then try to run this command:

pg_dump --format plain --verbose --file "C:\test.txt" --exclude-table-data="Personal stöff"."My däiary" --host="localhost" --port="5432" --username="postgres" --dbname="TestDB"

I've also tried a million variations of those quotes as well, including nested ones, and also tried with and without the --encoding option as well as setting the environment variable for encoding. It doesn't seem to matter -- it either fails to run the command or runs it but just ignores the exclude rules.

The resulting `test.txt` will contain the `This is supposed to be ignored.` text, even though we have told `pg_dump` to exclude the data in the specified table.

If you rename the schema name and table name to `personal_stuff.my_diary`, it will suddenly work.

This feature is apparently nine years old, if a blog post I found is accurate, yet is still at what can only be described as a "pre-alpha test stage". What I wonder, other than what to do now, is what all-lowercase PG users used to do a decade ago when they needed to exclude certain tables' data...? Surely that is a very basic feature for a backup tool?

Just to avoid wasting time, when the command doesn't work at all, it outputs things like this:

pg_dump: [archiver (db)] query failed: ERROR:  invalid byte sequence for encoding "UTF8": 0xf6 0x72 0x66 0x72
pg_dump: [archiver (db)] query was: SELECT c.oid
FROM pg_catalog.pg_class c
LEFT JOIN pg_catalog.pg_namespace n
ON n.oid OPERATOR(pg_catalog.=) c.relnamespace
WHERE c.relkind OPERATOR(pg_catalog.=) ANY
(array['r', 'S', 'v', 'm', 'f', 'p'])
  AND c.relname OPERATOR(pg_catalog.~) '^(table name)$'
  AND n.nspname OPERATOR(pg_catalog.~) '^(schema name)$'

Note how "table name" and "schema name" are both in all-lowercase even though they were inputted as "Table name" and "Schema name". This indicates to me that it uses some extremely crude and inaccurate logic for its internal mechanism/queries.

PS: The thousand-step ordeal I had to go through to finally get to send this e-mail to your e-mail list did not exactly improve my already bad mood... You must have got *extreme* amounts of abuse here to actually implement such a system... I hope that my e-mail is taken seriously, because I find this (and what PostGIS does) very serious.

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Pavel Stehule 2020-07-21 06:43:54 Re: pg_dump's "--exclude-table" and "--exclude-table-data" options are ignored and/or cause the dump to fail entirely unless both the schema and table name use 1950s-era identifiers.
Previous Message Andres Freund 2020-07-20 20:50:41 Re: Improvement for query planner? (no, not about count(*) again ;-))