Re: Experience and feedback on pg_restore --data-only

From: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
To: Dimitrios Apostolou <jimis(at)gmx(dot)net>
Cc: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: Experience and feedback on pg_restore --data-only
Date: 2025-03-24 16:09:19
Message-ID: d4c27299-11c8-4950-b0c1-bf24b1e4f353@aklaver.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 3/24/25 08:51, Dimitrios Apostolou wrote:
> On Mon, 24 Mar 2025, Adrian Klaver wrote:
>
>> On 3/24/25 07:24, Dimitrios Apostolou wrote:
>>>  On Sun, 23 Mar 2025, Laurenz Albe wrote:
>>>
>>>>  On Thu, 2025-03-20 at 23:48 +0100, Dimitrios Apostolou wrote:
>>>>>  Performance issues: (important as my db size is >5TB)
>>>>>
>>>>>  * WAL writes: I didn't manage to avoid writing to the WAL, despite
>>>>>  having
>>>>>     setting wal_level=minimal. I even wrote my own function to
>>>>> ALTER all
>>>>>     tables to UNLOGGED, but failed with "could not change table T to
>>>>>     unlogged because it references logged table".  I'm out of ideas on
>>>>>  this
>>>>>     one.
>>>>
>>>>  You'd have to create an load the table in the same transaction,
>>>> that is,
>>>>  you'd have to run pg_restore with --single-transaction.
>>>
>>>  That would restore the schema from the dump, while I want to create the
>>>  schema from the SQL code in version control.
>>
>>
>> I am not following, from your original post:
>>
>> "
>> ... create a
>> clean database by running the SQL schema definition from version
>> control, and
>> then copy the data for only the tables created.
>>
>> For this case, I choose to run pg_restore --data-only, and run it as
>> the user
>> who owns the database (dbowner), not as a superuser, in order to avoid
>> changes being introduced under the radar.
>> "
>>
>> You are running the process in two steps, where the first does not
>> involve
>> pg_restore. Not sure why doing the pg_restore --data-only portion in
>> single
>> transaction is not possible?
>
> Laurenz informed me that I could avoid writing to the WAL if I "create and
> load the table in a single transaction".

From here:

https://www.postgresql.org/message-id/455d28421ae33c73b73a6f527d2f72816ca5dd29.camel%40cybertec.at

What he said was:

"You'd have to create an load the table in the same transaction, that
is, you'd have to run pg_restore with --single-transaction."

Where I assume he meant '... create and load ...'. That is not the same
as what you are doing below.

> I haven't tried, but here is what I would do to try --single-transaction:
>
> Transaction 1: manually issuing all of CREATE TABLE etc.
>
> Transaction 2: pg_restore --single-transaction --data-only
>
> The COPY command in transaction 2 would still need to write to WAL, since
> it's separate from the CREATE TABLE.
>
> Am I wrong somewhere?
>

>> COPY is all or none(version 17+ caveat(see
>> https://www.postgresql.org/docs/current/sql-copy.html  ON_ERROR)), so
>> if the
>> data dump fails in --single-transaction everything rolls back.
>
> So if I restore all tables, then an error about a "table not found" would
> not roll back already copied tables, since it's not part of a COPY?

If you are following what you show above then the tables and other
objects would be created manually from the version control outside of
pg_restore and on successful completion and commit of that transaction
they would persist until such time as you change them. The second step
pg_restore --single-transaction --data-only is where you could 99% of
the way through and have a failure that rolls back all the data entered
in the tables.

>
>
> Thank you for the feedback,
> Dimitris
>

--
Adrian Klaver
adrian(dot)klaver(at)aklaver(dot)com

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Dimitrios Apostolou 2025-03-24 16:15:01 Re: Experience and feedback on pg_restore --data-only
Previous Message Dimitrios Apostolou 2025-03-24 16:05:52 Re: Experience and feedback on pg_restore --data-only