Re: Experience and feedback on pg_restore --data-only

From: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
To: Dimitrios Apostolou <jimis(at)gmx(dot)net>, pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: Experience and feedback on pg_restore --data-only
Date: 2025-03-23 15:37:11
Message-ID: 9e8852ec-d8fa-4fb6-a2d3-cd188ce0744a@aklaver.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 3/20/25 15:48, Dimitrios Apostolou wrote:
> Rationale:
>
> When restoring a backup in an emergency situation, it's fine to run
> pg_restore as superuser and get an exact replica of the dumped db.
> AFAICT pg_restore (without --data-only) is optimised for such case.
>
> But pg_dump/restore can be used as a generic data-copying utility, and in
> those cases it makes often sense to get rid of the churn and create a
> clean database by running the SQL schema definition from version control,
> and then copy the data for only the tables created.
>
> For this case, I choose to run pg_restore --data-only, and run it as the
> user who owns the database (dbowner), not as a superuser, in order to
> avoid changes being introduced under the radar.
>
> Things that made my life hard:
>
> * plenty of permission denials for both ALTER OWNER or SET SESSION
>   AUTHORIZATION (depending on command line switches).  Both of these
>   require superuser privilege, but in my case this is not really needed.
>   Dbowner has CREATEROLE and is the one who creates all the roles (WITH
>   SET TRUE), and their private schemata in the specific database.  Things
>   would work if pg_restore did "SET ROLE" instead of "SET SESSION
>   AUTHORIZATION" to switch user. Is this a straightforward change or there
>   are issues I don't see?

If this is --data-only what are the ALTER OWNER and SET SESSION
AUTHORIZATION for?

>
> * After each failed attempt, I need to issue a TRUNCATE table1,table2,...
>   before I try again.  I wrote my own function for that. It would help if
>   pg_restore would optionally truncate before COPY.  I believe it would
>   require superuser privilege for it, that could achieve using the
>   --superuser=username option used today for disabling the triggers.

That is what --clean is for, though it needs to have the objects(tables)
be in the restore e.g. not just --data-only.

>
> Performance issues: (important as my db size is >5TB)
>
> * WAL writes: I didn't manage to avoid writing to the WAL, despite having
>   setting wal_level=minimal. I even wrote my own function to ALTER all
>   tables to UNLOGGED, but failed with "could not change table T to
>   unlogged because it references logged table".  I'm out of ideas on this
>   one.
>
> * Indices: Could pg_restore have a switch to DROP indices before each
>   COPY, and re-CREATE them after, exactly as they were?  This would speed
>   up the process quite a bit.
>
>
> Any feedback for improving my process? Should I put these ideas somewhere
> as ideas for improvement on pg_restore?
>
> Thank you in advance,
> Dimitris
>
>
>

--
Adrian Klaver
adrian(dot)klaver(at)aklaver(dot)com

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Laurenz Albe 2025-03-23 16:00:39 Re: Experience and feedback on pg_restore --data-only
Previous Message David G. Johnston 2025-03-23 14:50:21 Re: Need help understanding has_function_privilege