Re: Resync second slave to new master

From: Yavuz Selim Sertoğlu <yavuzselimsertoglu(at)gmail(dot)com>
To: Dylan Luong <Dylan(dot)Luong(at)unisa(dot)edu(dot)au>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, "pgsql-generallists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: Resync second slave to new master
Date: 2018-03-08 07:48:29
Message-ID: CAJ7QKnZKHCaWxohMMD5b9bDcynsnJhz6eKa65HKtN-vr+18d+Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

If not set, could you add recovery.conf file
recovery_target_timeline='latest'
parameter?
https://www.postgresql.org/docs/devel/static/recovery-target-settings.html

2018-03-08 10:41 GMT+03:00 Dylan Luong <Dylan(dot)Luong(at)unisa(dot)edu(dot)au>:

> Hi Michael,
>
> I tested the failover today and the slave 2 failed to resync with the new
> master (old slave1).
>
> After I promoted the slave1 to become master, I was able to use pg_rewind
> on the old master and bring it back as new slave.
>
> I then stopped slave2 and ran pg_rewind on slave2 against new master, it
> report that no rewind was required:
>
> $ pg_rewind -D /var/lib/pgsql/9.6/data --source-server="host=xxxxx.xxx.xxxx
> port=5432 user=postgres"
> servers diverged at WAL position 1BB/AB000098 on timeline 5
> no rewind required
>
> So I then updated the recovery.conf on slave2 with primary_conninfo equal
> to the new master IP.
> When starting up posgres, it failed with the following error in the logs:
>
> database system was shut down in recovery at 2018-03-08 17:52:10 ACDT
> 2018-03-08 17:56:27 ACDT [23026]: [2-1] db=,user= app=,host= LOG:
> entering standby mode
> cp: cannot stat '/pg_backup/backup/archive /00000005.history': No such
> file or directory
> cp: cannot stat '/pg_backup/backup/archive /00000005000001BB000000AB': No
> such file or directory
> 2018-03-08 17:56:27 ACDT [23026]: [3-1] db=,user= app=,host= LOG:
> consistent recovery state reached at 1BB/AB000098
> 2018-03-08 17:56:27 ACDT [23026]: [4-1] db=,user= app=,host= LOG: record
> with incorrect prev-link 1B9/73000040 at 1BB/AB000098
> 2018-03-08 17:56:27 ACDT [23024]: [3-1] db=,user= app=,host= LOG:
> database system is ready to accept read only connections
> 2018-03-08 17:56:27 ACDT [23032]: [1-1] db=,user= app=,host= LOG: started
> streaming WAL from primary at 1BB/AB000000 on timeline 5
> 2018-03-08 17:56:27 ACDT [23032]: [2-1] db=,user= app=,host= LOG:
> replication terminated by primary server
> 2018-03-08 17:56:27 ACDT [23032]: [3-1] db=,user= app=,host= DETAIL: End
> of WAL reached on timeline 5 at 1BB/AB000098.
> cp: cannot stat '/pg_backup/backup/archive_sync/00000005000001BB000000AB':
> No such file or directory
> 2018-03-08 17:56:27 ACDT [23032]: [4-1] db=,user= app=,host= LOG:
> restarted WAL streaming at 1BB/AB000000 on timeline 5
> 2018-03-08 17:56:27 ACDT [23032]: [5-1] db=,user= app=,host= LOG:
> replication terminated by primary server
> 2018-03-08 17:56:27 ACDT [23032]: [6-1] db=,user= app=,host= DETAIL: End
> of WAL reached on timeline 5 at 1BB/AB000098.
>
>
> On the new master in the /pg_backup/backup/archive folder I can see a file
> 00000005000001BB000000AB.partial
> Eg.
> ls -l
> -rw-------. 1 postgres postgres 16777216 Mar 8 16:48
> 00000005000001BB000000AB.partial
> -rw-------. 1 postgres postgres 16777216 Mar 8 16:49
> 00000006000001BB000000AB
> -rw-------. 1 postgres postgres 16777216 Mar 8 16:49
> 00000006000001BB000000AC
> -rw-------. 1 postgres postgres 16777216 Mar 8 16:49
> 00000006000001BB000000AD
> -rw-------. 1 postgres postgres 16777216 Mar 8 16:49
> 00000006000001BB000000AE
> -rw-------. 1 postgres postgres 16777216 Mar 8 16:49
> 00000006000001BB000000AF
> -rw-------. 1 postgres postgres 16777216 Mar 8 16:49
> 00000006000001BB000000B0
> -rw-------. 1 postgres postgres 16777216 Mar 8 16:49
> 00000006000001BB000000B1
> -rw-------. 1 postgres postgres 16777216 Mar 8 16:49
> 00000006000001BB000000B2
> -rw-------. 1 postgres postgres 16777216 Mar 8 16:50
> 00000006000001BB000000B3
> -rw-------. 1 postgres postgres 16777216 Mar 8 17:01
> 00000006000001BB000000B4
> -rw-------. 1 postgres postgres 16777216 Mar 8 17:14
> 00000006000001BB000000B5
> -rw-------. 1 postgres postgres 218 Mar 8 16:48 00000006.history
>
> Any ideas?
>
> Dylan
>
> -----Original Message-----
> From: Michael Paquier [mailto:michael(at)paquier(dot)xyz]
> Sent: Tuesday, 6 March 2018 5:55 PM
> To: Dylan Luong <Dylan(dot)Luong(at)unisa(dot)edu(dot)au>
> Cc: pgsql-generallists.postgresql.org <pgsql-general(at)lists(dot)postgresql(dot)org>
> Subject: Re: Resync second slave to new master
>
> On Tue, Mar 06, 2018 at 06:00:40AM +0000, Dylan Luong wrote:
> > So everytime after promoting Slave to become master (either manually
> > or automatic), just stop Slave2 and run pg_rewind on slave2 against
> > the new maser (old slave1). And when old master server is available
> > again, use pg_rewind on that serve as well against new master to
> > return to original configuration.
>
> Yes. That's exactly the idea. Running pg_rewind on the old master will
> be necessary anyway because you need to stop it cleanly once, which will
> cause it to generate WAL records at least for the shutdown checkpoint,
> while doing it on slave 2 may be optional, still safer to do.
> --
> Michael
>
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Michael Paquier 2018-03-08 08:10:44 Re: Resync second slave to new master
Previous Message Dylan Luong 2018-03-08 07:41:31 RE: Resync second slave to new master