Re: Scriptable way to validate a pg_dump restore ?

From: Shaheed Haque <shaheedhaque(at)gmail(dot)com>
To: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
Cc: Laura Smith <n5d9xq3ti233xiyif2vp(at)protonmail(dot)ch>, postgre <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: Scriptable way to validate a pg_dump restore ?
Date: 2024-01-29 17:28:32
Message-ID: CAHAc2jfmPh1u1hpw2dNS+VcLQsapAU9D6sdSf1FDS5WgF-Wvpg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Mon, 29 Jan 2024, 22:52 Adrian Klaver, <adrian(dot)klaver(at)aklaver(dot)com> wrote:

> On 1/29/24 00:12, Laura Smith wrote:
> > Hi
> >
> > Let's say I've got a scenario where I'm doing a pg_dump replication
> rather than online streaming, e.g. due to air-gap or whatever.
> >
> > Is there a scriptable way to validate the restore ? e.g. using doing
> something clever with ctid or something to ensure both the schema and all
> its rows were restored to the same point at which the dump was taken ?
>
> Assuming you are using pg_restore on a non-plain text dump file then
> from pg_restore.c:
>
> * pg_restore.c
>
> * pg_restore is an utility extracting postgres database definitions
> * from a backup archive created by pg_dump using the archiver
> * interface.
> *
> * pg_restore will read the backup archive and
> * dump out a script that reproduces
> * the schema of the database in terms of
> * user-defined types
> * user-defined functions
> * tables
> * indexes
> * aggregates
> * operators
> * ACL - grant/revoke
> *
> * the output script is SQL that is understood by PostgreSQL
> *
> * Basic process in a restore operation is:
> *
> * Open the Archive and read the TOC.
> * Set flags in TOC entries, and *maybe* reorder them.
> * Generate script to stdout
> * Exit
>
> Then:
>
> pg_restore -l -f <output_file> <dump_file>
>
> to get the TOC mentioned above. Walk through that to verify schema is
> the same in the restored database.
>
> This will not tell you whether all the data was transferred. You will
> either have to trust from pg_dump.c:
>
> * pg_dump will read the system catalogs in a database and dump out a
> * script that reproduces the schema in terms of SQL that is
> understood
> * by PostgreSQL
> *
> * Note that pg_dump runs in a transaction-snapshot mode transaction,
> * so it sees a consistent snapshot of the database including system
> * catalogs. However, it relies in part on various specialized
> backend
> * functions like pg_get_indexdef(), and those things tend to look at
> * the currently committed state. So it is possible to get 'cache
> * lookup failed' error if someone performs DDL changes while a
> dump is
> * happening. The window for this sort of thing is from the
> acquisition
> * of the transaction snapshot to getSchemaData() (when pg_dump
> acquires
> * AccessShareLock on every table it intends to dump). It isn't
> very large,
> * but it can happen.
>
> Or come up with way to capture the state of the data at the time of dump
> and then compare to restored database. Something like Ron posted.
>

Right, for me, state, not just record count is what I'm interested in (for
the initial full table copy part of replication). So, given the explanation
about the possible per-table window, is there some property of the table
that could be used to confirm that a table has made it across?

I guess there is such a thing since the following incremental syncing would
presumably need it. I had hoped the LSN was this thing, but confirmation
would be great.

Thanks, Shaheed

> >
> > Thanks !
> >
> > Laura
> >
> >
>
> --
> Adrian Klaver
> adrian(dot)klaver(at)aklaver(dot)com
>
>
>
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Adrian Klaver 2024-01-29 17:57:00 Re: Scriptable way to validate a pg_dump restore ?
Previous Message Ron Johnson 2024-01-29 16:52:54 Re: Query performance in 9.6.24 vs 14.10