Re: Would you add a --dry-run to pg_restore?

From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Edmundo Robles <edmundo(at)sw-argos(dot)com>, PostgreSQL General <pgsql-general(at)postgresql(dot)org>
Subject: Re: Would you add a --dry-run to pg_restore?
Date: 2017-08-03 12:58:02
Message-ID: CAHyXU0xoDeZ7W+_Rb9WFjtfjGP_TZ=v-_2=zwQUANOd+5aUw9Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Wed, Aug 2, 2017 at 11:49 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Edmundo Robles <edmundo(at)sw-argos(dot)com> writes:
>> I mean, to verify the integrity of backup i do:
>> gunzip -c backup_yesterday.gz | pg_restore -d my_database && echo
>> "backup_yesterday is OK"
>
>> but my_database's size, uncompresed, is too big more than 15G and
>> sometimes i have no space to restore it, so always i must declutter my
>> disk first.
>
>> Will be great to have a dry run option, because the time to verify
>> reduces a lot and will save space on disk, because just execute with no
>> write to disk.
>
> What do you imagine a dry run option would do?
>
> If you just want to see if the file contains obvious corruption,
> you could do
>
> pg_restore file >/dev/null
>
> and see if it prints any complaints on stderr. If you want to have
> confidence that the file would actually restore (and that there aren't
> e.g. unique-index violations or foreign-key violations in the data),
> I could imagine a mode where pg_restore wraps its output in "begin" and
> "rollback". But that's not going to save any disk space, or time,
> compared to doing a normal restore into a scratch database.
>
> I can't think of any intermediate levels of verification that wouldn't
> involve a huge amount of work to implement ... and they'd be unlikely
> to catch interesting problems in practice. For instance, I doubt that
> syntax-checking but not executing the SQL coming out of pg_restore would
> be worth the trouble. If an archive is corrupt enough that it contains
> bad SQL, it probably has problems that pg_restore would notice anyway.
> Most of the restore failures that we hear about in practice would not be
> detectable without actually executing the commands, because they involve
> problems like issuing commands in the wrong order.

The vast majority of my restore issues are dependency problems (for
example, postgis extension not being present). A distant second place
would be pg_restore's inability to do things in the proper order or
gaps in the dump feature itself (for example, a cast between two built
in types, at least back in the day).

A good reasonable test for all of those cases with the current tools
is to do a schema only restore (which should not take long in most
cases). If you get past that step, there is an exceptionally high
probability that the restore will succeed sans some controllable
factors like running out of space.

There are some rare known considerations that could a data load to
fail. For example, a unique index on floating point can dump but not
load if two binary differentiated values render to the same string.
I've never seen this in practice however. So I'd argue to just use
that (schema only) feature for pre-load verification if you're
paranoid.

merlin

In response to

Browse pgsql-general by date

  From Date Subject
Next Message armand pirvu 2017-08-03 14:28:01 Re: unexpected pageaddr
Previous Message Rory Campbell-Lange 2017-08-03 09:02:57 Re: Would you add a --dry-run to pg_restore?