Re: Point-in-time recovery after failover

From: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
To: Dylan Luong <Dylan(dot)Luong(at)unisa(dot)edu(dot)au>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Point-in-time recovery after failover
Date: 2018-03-13 11:51:41
Message-ID: 1520941901.2434.3.camel@cybertec.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Dylan Luong wrote:
> We are on Postgres 9.6 and we have primary/standby wal replication setup for HA.
>
> I am trying to perform a point-in-time recovery after a failover has occurred.
>
> I extracted the base backups (tar files) to the data directory and extracted the all the archive backups (P1 and S1) to the archive directory.
> Followed the steps from
> Created recovery.conf with the following, but the recovery only recovers, to the point before the failover and comes online.
>
> Example of my recovery.conf :
> restore_command = 'cp /pg_backup/backup/archive/%f %p'
> recovery_target_time = '2018-03-13 13:54:00'
> recovery_target_inclusive = 'false'
> recovery_target_timeline = '6' (I also tried revovery_target_timeline = ‘lastest)
>
> Is it possible to recovery to PITR after a failover using base backup from before failover and both archive backups from P1 and S1?
>
>
> Here is the log output :
>
> 018-03-13 20:46:53 ACDT [154912]: [1-1] db=,user= app=,host= LOG: database system was interrupted; last known up at 2018-03-1
> 3 01:31:25 ACDT
> 2018-03-13 20:46:53 ACDT [154912]: [2-1] db=,user= app=,host= LOG: restored log file "00000006.history" from archive
> cp: cannot stat ‘/pg_backup/backup/archive/00000007.history’: No such file or directory
> 2018-03-13 20:46:53 ACDT [154912]: [3-1] db=,user= app=,host= LOG: starting point-in-time recovery to 2018-03-13 13:54:00+10:3
> 0
> 2018-03-13 20:46:53 ACDT [154912]: [4-1] db=,user= app=,host= LOG: restored log file "00000006.history" from archive
> cp: cannot stat ‘/pg_backup/backup/archive/000000060000011100000087’: No such file or directory
> 2018-03-13 20:46:53 ACDT [154912]: [5-1] db=,user= app=,host= LOG: restored log file "000000050000011100000087" from archive
> cp: cannot stat ‘/pg_backup/backup/archive/00000005.history’: No such file or directory
> 2018-03-13 20:46:53 ACDT [154912]: [6-1] db=,user= app=,host= LOG: redo starts at 111/87000028
> 2018-03-13 20:46:53 ACDT [154912]: [7-1] db=,user= app=,host= LOG: consistent recovery state reached at 111/870B4418
> 2018-03-13 20:46:53 ACDT [154908]: [3-1] db=,user= app=,host= LOG: database system is ready to accept read only connections
> cp: cannot stat ‘/pg_backup/backup/archive/000000060000011100000088’: No such file or directory
> 2018-03-13 20:46:53 ACDT [154912]: [8-1] db=,user= app=,host= LOG: restored log file "000000050000011100000088" from archive
> cp: cannot stat ‘/pg_backup/backup/archive/000000060000011100000089’: No such file or directory
> 2018-03-13 20:46:53 ACDT [154912]: [9-1] db=,user= app=,host= LOG: restored log file "000000050000011100000089" from archive
> cp: cannot stat ‘/pg_backup/backup/archive/00000006000001110000008A’: No such file or directory
> 2018-03-13 20:46:53 ACDT [154912]: [10-1] db=,user= app=,host= LOG: restored log file "00000005000001110000008A" from archive
[...]
> cp: cannot stat ‘/pg_backup/backup/archive/0000000600000111000000AD’: No such file or directory
> 2018-03-13 20:46:57 ACDT [154912]: [45-1] db=,user= app=,host= LOG: restored log file "0000000500000111000000AD" from archive
> cp: cannot stat ‘/pg_backup/backup/archive/0000000600000111000000AE’: No such file or directory
> cp: cannot stat ‘/pg_backup/backup/archive/0000000500000111000000AE’: No such file or directory
> 2018-03-13 20:46:57 ACDT [154912]: [46-1] db=,user= app=,host= LOG: redo done at 111/ADFFE160
> 2018-03-13 20:46:57 ACDT [154912]: [47-1] db=,user= app=,host= LOG: last completed transaction was at log time 2018-03-13 13:4
> 0:02.36347+10:30
>
>
> Here is what is in the archive directory after extracting from the archive backups and after database comes online:
>
[...]
> -rw-------. 1 postgres postgres 16777216 Mar 13 13:40 0000000500000111000000AA
> -rw-------. 1 postgres postgres 16777216 Mar 13 13:40 0000000500000111000000AB
> -rw-------. 1 postgres postgres 16777216 Mar 13 13:40 0000000500000111000000AC
> -rw-------. 1 postgres postgres 16777216 Mar 13 13:40 0000000500000111000000AD
> -rw-------. 1 postgres postgres 16777216 Mar 13 13:48 0000000500000111000000B1.partial
> -rw-------. 1 postgres postgres 16777216 Mar 13 13:49 0000000600000111000000B1
> -rw-------. 1 postgres postgres 16777216 Mar 13 14:04 0000000600000111000000B2
> -rw-------. 1 postgres postgres 221 Mar 13 13:48 00000006.history
> -rw-------. 1 postgres postgres 16777216 Mar 13 20:48 0000000700000111000000AD
> -rw-------. 1 postgres postgres 271 Mar 13 20:46 00000007.history

You are missing 0000000500000111000000AE or 0000000600000111000000AE.

One of these files should be there.

I suspect that there is a problem with your WAL archiving in connection with failover.

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Melvin Davidson 2018-03-13 13:09:02 Re: Programmatically duplicating a schema
Previous Message Dylan Luong 2018-03-13 11:28:26 Point-in-time recovery after failover