Re: Scriptable way to validate a pg_dump restore ?

From: Ron Johnson <ronljohnsonjr(at)gmail(dot)com>
To: Laura Smith <n5d9xq3ti233xiyif2vp(at)protonmail(dot)ch>
Cc: postgre <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: Scriptable way to validate a pg_dump restore ?
Date: 2024-01-29 09:06:07
Message-ID: CANzqJaB3mx3OvNAp81jPB16QJDhqXYWv0Y3iYQ-3SbwiVDKC5Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Mon, Jan 29, 2024 at 3:12 AM Laura Smith <
n5d9xq3ti233xiyif2vp(at)protonmail(dot)ch> wrote:

> Hi
>
> Let's say I've got a scenario where I'm doing a pg_dump replication rather
> than online streaming, e.g. due to air-gap or whatever.
>
> Is there a scriptable way to validate the restore ? e.g. using doing
> something clever with ctid or something to ensure both the schema and all
> its rows were restored to the same point at which the dump was taken ?
>

1. Since pg_dump creates a logical copy, nothing binary like ctid will work.
2. pg_dump is designed to take a snapshot.If it doesn't, it would fail
for people who do logical replication. However, no one has noticed.

That's kinda like being asked to prove that rocks *always* fall when you
drop them. Either you trust physics, because physics has always worked, or
you must watch every rock, because next time it might not fall. The
analogy is slightly flawed, since we always check the pg_dump and
pg_restore return codes, since something else might impact their function.

But if you still need evidence, here's what I'm doing to verify table and
record counts during a 9.6 -> 14 migration. You'll have to modify it for
your purpose.

Create this table and function beforehand:
CREATE TABLE dba.migration_table_counts (
location text check (location in ('96', '14'))
, table_name text
, row_count bigint
, count_time timestamp without time zone default current_timestamp
, primary key (table_name, location)
);
CREATE OR REPLACE FUNCTION dba.get_table_counts(_p_source TEXT) RETURNS
INTEGER
LANGUAGE plpgsql
AS $func$
DECLARE
r RECORD;
_sql TEXT;
_table_count BIGINT;
BEGIN
FOR r IN select relnamespace::regnamespace::text||'.'||relname as
table_name
from pg_class cla
where relkind = 'r'
and not exists (select 1 -- excludes parent tables
from pg_inherits inh1
where inh1.inhparent = cla.oid)
and relnamespace::regnamespace::text
not in ('pg_catalog', 'information_schema', 'dba')
order by 1
LOOP
_sql := FORMAT('SELECT COUNT(*) FROM %s;', r.table_name);
RAISE NOTICE '% %', to_char(clock_timestamp(), 'YYYY-MM-DD HH24:MI:
SS.MS'), _sql;
EXECUTE _sql INTO _table_count;
--RAISE NOTICE '%', _table_count;
INSERT INTO dba.migration_table_counts (location, table_name,
row_count)
VALUES (_p_source, r.table_name, _table_count);
END LOOP;
RAISE NOTICE '% %', to_char(clock_timestamp(), 'YYYY-MM-DD HH24:MI:SS.MS'),
'Finished';
RETURN 0;
END
$func$;

Run this script in a cron job that executes at the same time as your
pg_dump cron job. Parameters should be for the source database.
#!/bin/bash
declare -gr Server=$1
declare -gr DB=$2
declare -gr Source=$3
if [ -z $Server ]; then echo "Requires a server name."; exit 2; fi
if [ -z $DB ]; then echo "Requires a DB name."; exit 3; fi
if [ -z $Source ]; then echo "Requires a source: 96|14."; exit 4; fi
psql -U postgres -h ${Server} $DB -Xac "DELETE FROM
dba.migration_table_counts WHERE location = '$Source';"
psql -U postgres -h ${Server} $DB -Xac "select * from
dba.get_table_counts('$Source');"

Run the same script on the destination server after the pg_restore is
finished.
Dump the source dba.migration_table_counts then load it into the
destination dba.migration_table_counts.

These two queries run on the destination server will check that all tables
exist in both databases, and that the record counts are the same.

You'll have some slight variations, since the two jobs are in separate
transactions. (Mine won't, since the applications will be shut down, and
pg_hba.conf will block them.)

declare -gr SQL1="
with
c96 as (select table_name, row_count from dba.migration_table_counts
where location = '96'),
c14 as (select table_name, row_count from dba.migration_table_counts
where location = '14')
select c96.*, c14.*
from c96 full join c14 on c96.table_name = c14.table_name
where c96.table_name is null
or c14.table_name is null
order by c96.table_name, c14.table_name;"
psql $DB -ac "$SQL1"

declare -gr SQL2="
with
c96 as (select table_name, row_count from dba.migration_table_counts
where location = '96'),
c14 as (select table_name, row_count from dba.migration_table_counts
where location = '14')
select c96.*, c14.*, c96.row_count - c14.row_count as row_diff
from c96 inner join c14 on c96.table_name = c14.table_name
where c96.row_count != c14.row_count
order by c96.table_name;"
psql $DB -ac "$SQL2"

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Bob Jolliffe 2024-01-29 12:08:57 Re: Query performance in 9.6.24 vs 14.10
Previous Message Shaheed Haque 2024-01-29 09:03:48 Re: Scriptable way to validate a pg_dump restore ?