Experience and feedback on pg_restore --data-only

From: Dimitrios Apostolou <jimis(at)gmx(dot)net>
To: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Experience and feedback on pg_restore --data-only
Date: 2025-03-20 22:48:11
Message-ID: 53760c70-4a87-a453-9e02-57abc9cb2e54@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Rationale:

When restoring a backup in an emergency situation, it's fine to run
pg_restore as superuser and get an exact replica of the dumped db.
AFAICT pg_restore (without --data-only) is optimised for such case.

But pg_dump/restore can be used as a generic data-copying utility, and in
those cases it makes often sense to get rid of the churn and create a
clean database by running the SQL schema definition from version control,
and then copy the data for only the tables created.

For this case, I choose to run pg_restore --data-only, and run it as the
user who owns the database (dbowner), not as a superuser, in order to
avoid changes being introduced under the radar.

Things that made my life hard:

* plenty of permission denials for both ALTER OWNER or SET SESSION
AUTHORIZATION (depending on command line switches). Both of these
require superuser privilege, but in my case this is not really needed.
Dbowner has CREATEROLE and is the one who creates all the roles (WITH
SET TRUE), and their private schemata in the specific database. Things
would work if pg_restore did "SET ROLE" instead of "SET SESSION
AUTHORIZATION" to switch user. Is this a straightforward change or there
are issues I don't see?

* After each failed attempt, I need to issue a TRUNCATE table1,table2,...
before I try again. I wrote my own function for that. It would help if
pg_restore would optionally truncate before COPY. I believe it would
require superuser privilege for it, that could achieve using the
--superuser=username option used today for disabling the triggers.

Performance issues: (important as my db size is >5TB)

* WAL writes: I didn't manage to avoid writing to the WAL, despite having
setting wal_level=minimal. I even wrote my own function to ALTER all
tables to UNLOGGED, but failed with "could not change table T to
unlogged because it references logged table". I'm out of ideas on this
one.

* Indices: Could pg_restore have a switch to DROP indices before each
COPY, and re-CREATE them after, exactly as they were? This would speed
up the process quite a bit.

Any feedback for improving my process? Should I put these ideas somewhere
as ideas for improvement on pg_restore?

Thank you in advance,
Dimitris

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Michael Paquier 2025-03-21 00:54:55 Re: query_id: jumble names of temp tables for better pg_stat_statement UX
Previous Message Tom Lane 2025-03-20 21:56:05 Re: After upgrading libpq, the same function(PQftype) call returns a different OID