Re: Database corruption in cascaded replica, "pg_xact/003A" doesn't exist, reading as zeroes"

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Dejan Petrovic <dejan(dot)petrovic(at)islonline(dot)com>
Cc: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: Database corruption in cascaded replica, "pg_xact/003A" doesn't exist, reading as zeroes"
Date: 2018-12-05 15:01:20
Message-ID: 20181205150120.GD3415@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Greetings,

* Dejan Petrovic (dejan(dot)petrovic(at)islonline(dot)com) wrote:
> I believe this is a result of my "broken" procedure for setting up a
> cascaded replica. I would love to know where the issue is.

[...]

> Notes:
> Machines are running on Centos 7, Postgresql 10.2
> DB-1 = master
> DB-2 = replica of DB-1
> DB-3 = replica of DB-2
>
> --------------> DB-3 (new cascaded replica)
> systemctl disable postgresql-10
> systemctl stop postgresql-10
> --------------> DB-1 (master)
> psql -U postgres
> select pg_start_backup('clone',true);
> <------------- DB-3 (cascaded replica)
> rsync -azv -e --delete DB-2:/isl/pgsql10/data /isl/pgsql10/ --exclude pg_wal
> --exclude postgresql.pid
> assert rsync finished
> vi recovery.conf
>     standby_mode = 'on'
>     primary_conninfo = 'host=DB-2'
>     recovery_target_timeline = 'latest'
> -------------> DB-1
> select pg_stop_backup();
> <------------- DB-3
> rsync -azv -e --delete DB-2:/isl/pgsql10/data/pg_wal /isl/pgsql10/data/
> systemctl start postgresql-10
> systemctl enable postgresql-10

If I'm reading this correctly, you're doing a pg_start_backup() on the
primary, but then making a copy of the DB-2 replica (while the primary
and the DB-2 replica are still running..).

In short, no, that doesn't work. Never has. That someone thought they
could create a replica that way is just another good reason to rip out
the broken exclusive-backup mode. I hope this isn't how your regular
backups are done.

I'd strongly suggest that you use existing tools to do this- in
partciular, you should be able to use pg_basebackup against the DB-2
replica to build out your cascaded replica and that'll do all the right
things. Alternatively, using a backup solution like pgbackrest would
allow you to quickly build a replica from your backups instead of from a
running replica.

Thanks!

Stephen

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2018-12-05 15:22:49 Re: simple division
Previous Message Ron 2018-12-05 14:55:00 Re: surprising query optimisation