Re: how to decrease the promotion time when performing a multiple failovers.....

From: Vladimir Borodin <root(at)simply(dot)name>
To: Shay Cohavi <cohavisi(at)gmail(dot)com>
Cc: pgsql-admin(at)postgresql(dot)org
Subject: Re: how to decrease the promotion time when performing a multiple failovers.....
Date: 2016-01-01 10:55:40
Message-ID: 3DA5A411-E428-4F6A-9487-645407205414@simply.name
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin pgsql-general


> 1 янв. 2016 г., в 9:29, Shay Cohavi <cohavisi(at)gmail(dot)com> написал(а):
>
> Hi,
> I have postgresql 9.3 setup with 2 nodes (active/standby with streaming replication & continuos archiving).
> I have created 2 failover & failback script in order to perform a switchover between the DB servers:
> 1. failover - create a trigger file in order to promote the new primary.
> 2. failback - perform a base backup as mentions in :
> a. start backup on the primary.
> b. stop the failed node .
> didn't delete the DB directory on the failed node
> c. performing rsync between the nodes.
> d.stopping the backup on the primary.
> e.performing rsync on the pg_xlog.
> f. creating a recovery.conf
>
> standby_mode = 'on'
> primary_conninfo = 'host=10.50.1.153 port=5432 user=usr password=pass'
> restore_command = 'scp 10.50.1.153:/home/postgres/archive/%f %p'
> trigger_file = '/home/postgres/databases/fabrix/trigger'
> archive_cleanup_command = 'ssh 10.50.1.153 /home/postgres/pg_utils/archive_cleanup.sh %r'
>
> g. starting the failed node as secondary.
>
> the switchover method:
> 1. stop the primary node.
> 2. promote the secondary node (failover.sh).
> 3. perform failback on the failed node.
> 4. start the failed node.
>
> this method works great!
>
>
> but if I perform multiple switchovers (>20), each time the new primary gets promoted (trigger file) - it takes longer because it searches the timelines on the archive.
> is there any way to prevent the multiple 'scp' archive commands which makes the promotion longer!
>
> for example:
>
> [2015-12-12 20:35:10.769 IST] LOG: trigger file found: /home/postgres/databases/fabrix/trigger
> [2015-12-12 20:35:10.769 IST] FATAL: terminating walreceiver process due to administrator command
> scp: /home/postgres/archive/0000009400000002000000DC: No such file or directory
> [2015-12-12 20:35:10.893 IST] LOG: record with zero length at 2/DC000168
> [2015-12-12 20:35:10.893 IST] LOG: redo done at 2/DC000100
> scp: /home/postgres/archive/0000009400000002000000DC: No such file or directory
> scp: /home/postgres/archive/0000009300000002000000DC: No such file or directory
> scp: /home/postgres/archive/0000009200000002000000DC: No such file or directory
> .
> .
> .
>
> scp: /home/postgres/archive/0000009100000002000000DC: No such file or directory
> scp: /home/postgres/archive/0000009000000002000000DC: No such file or directory
> scp: /home/postgres/archive/00000095.history: No such file or directory
> [2015-12-12 20:35:11.801 IST] LOG: selected new timeline ID: 149
> [2015-12-12 20:35:11.931 IST] LOG: restored log file "00000094.history" from archive
> [2015-12-12 20:35:12.173 IST] LOG: archive recovery complete
> [2015-12-12 20:35:12.181 IST] LOG: database system is ready to accept connections
> [2015-12-12 20:35:12.181 IST] LOG: autovacuum launcher started
>
> this could take for a least 1 min.....or more.
>
> is there any way to skip the timeline searching in order to decrease the promotion?

You should add recovery_target_timeline = 'latest' to your recovery.conf [0].

[0] http://www.postgresql.org/docs/9.3/static/warm-standby.html <http://www.postgresql.org/docs/9.3/static/warm-standby.html>
<...>
If you plan to have multiple standby servers for high availability purposes, set recovery_target_timeline to latest, to make the standby server follow the timeline change that occurs at failover to another standby.
<...>

>
>
> Thanks,
> ShayC
>

--
May the force be with you…
https://simply.name

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Shay Cohavi 2016-01-01 16:32:45 Re: how to decrease the promotion time when performing a multiple failovers.....
Previous Message Shay Cohavi 2016-01-01 06:31:31 how to decrease the promotion time when performing a multiple failovers.....

Browse pgsql-general by date

  From Date Subject
Next Message Alban Hertroys 2016-01-01 13:15:21 Re: to_timestamp alternatives
Previous Message Thomas Kellerer 2016-01-01 10:11:27 Re: to_timestamp alternatives