| From: | Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com> | 
|---|---|
| To: | Laura Smith <n5d9xq3ti233xiyif2vp(at)protonmail(dot)ch>, postgre <pgsql-general(at)lists(dot)postgresql(dot)org> | 
| Subject: | Re: Scriptable way to validate a pg_dump restore ? | 
| Date: | 2024-01-29 16:52:16 | 
| Message-ID: | e0fa9915-9356-45e6-891f-d24b6a274e7e@aklaver.com | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-general | 
On 1/29/24 00:12, Laura Smith wrote:
> Hi
> 
> Let's say I've got a scenario where I'm doing a pg_dump replication rather than online streaming, e.g. due to air-gap or whatever.
> 
> Is there a scriptable way to validate the restore ?  e.g. using doing something clever with ctid or something to ensure both the schema and all its rows were restored to the same point at which the dump was taken ?
Assuming you are using pg_restore on a non-plain text dump file then
from pg_restore.c:
* pg_restore.c
*      pg_restore is an utility extracting postgres database definitions
  *      from a backup archive created by pg_dump using the archiver
  *      interface.
  *
  *      pg_restore will read the backup archive and
  *      dump out a script that reproduces
  *      the schema of the database in terms of
  *                user-defined types
  *                user-defined functions
  *                tables
  *                indexes
  *                aggregates
  *                operators
  *                ACL - grant/revoke
  *
  * the output script is SQL that is understood by PostgreSQL
  *
  * Basic process in a restore operation is:
  *
  *      Open the Archive and read the TOC.
  *      Set flags in TOC entries, and *maybe* reorder them.
  *      Generate script to stdout
  *      Exit
Then:
pg_restore -l -f <output_file> <dump_file>
to get the TOC mentioned above. Walk through that to verify schema is 
the same in the restored database.
This will not tell you whether all the data was transferred. You will 
either have to trust from pg_dump.c:
  *      pg_dump will read the system catalogs in a database and dump out a
  *      script that reproduces the schema in terms of SQL that is 
understood
  *      by PostgreSQL
  *
  *      Note that pg_dump runs in a transaction-snapshot mode transaction,
  *      so it sees a consistent snapshot of the database including system
  *      catalogs. However, it relies in part on various specialized backend
  *      functions like pg_get_indexdef(), and those things tend to look at
  *      the currently committed state.  So it is possible to get 'cache
  *      lookup failed' error if someone performs DDL changes while a 
dump is
  *      happening. The window for this sort of thing is from the 
acquisition
  *      of the transaction snapshot to getSchemaData() (when pg_dump 
acquires
  *      AccessShareLock on every table it intends to dump). It isn't 
very large,
  *      but it can happen.
Or come up with way to capture the state of the data at the time of dump 
and then compare to restored database. Something like Ron posted.
> 
> Thanks !
> 
> Laura
> 
> 
-- 
Adrian Klaver
adrian(dot)klaver(at)aklaver(dot)com
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Ron Johnson | 2024-01-29 16:52:54 | Re: Query performance in 9.6.24 vs 14.10 | 
| Previous Message | Ron Johnson | 2024-01-29 14:18:37 | Re: Query performance in 9.6.24 vs 14.10 |