From: | ning chan <ninchan8328(at)gmail(dot)com> |
---|---|
To: | "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org> |
Subject: | Streaming Replication Failover |
Date: | 2013-01-17 05:17:30 |
Message-ID: | CAG0k5vDu=qkKBWWa=jiSDxhXk6jww3-vPKHLQYq=aTzq9NcF8w@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Hi,
I have a cluster of 3 nodes Primary is connected by StandbyA (streaming),
Standby A is connected by Standby B (streaming).
I failed over the cluster
1) stop primary
2) promoted StandbyA
Now i see from syslog on Standby B that it is complaining about the
timeline mismatch.
Replication Status from Primary
=============================================
|Parameters | Value |
=============================================
|backend_start | 2013-01-16 23:05:48 |
|pid | 17851 |
|usesysid | 10 |
|usename | postgres |
|application_name | StandbyA |
|client_addr | 10.89.94.31 |
|client_hostname | |
|client_port | 43558 |
|state | streaming |
|sent_location | 0/1EAC3E68 |
|write_location | 0/1EAC3E68 |
|flush_location | 0/1EAC3E68 |
|replay_location | 0/1EAC3E68 |
|sync_priority | 0 |
|sync_state | async |
=============================================
Replication Status from Standby A
=============================================
|Parameters | Value |
=============================================
|backend_start | 2013-01-16 23:06:56 |
|pid | 12320 |
|usesysid | 10 |
|usename | postgres |
|application_name | StandByB |
|client_addr | 10.89.94.29 |
|client_hostname | |
|client_port | 48214 |
|state | streaming |
|sent_location | 0/1EAC3E68 |
|write_location | 0/1EAC3E68 |
|flush_location | 0/1EAC3E68 |
|replay_location | 0/1EAC3E68 |
|sync_priority | 0 |
|sync_state | async |
=============================================
now fail over Primary
On StandByA syslog,
Jan 16 23:08:12 se032c-94-31 postgres[12316]: [3-1] 12316FATAL:
replication terminated by primary server
Jan 16 23:08:12 se032c-94-31 postgres[12312]: [5-1] 12312LOG: redo starts
at 0/1EAC3E68
On StandByB syslog
Jan 16 23:09:48 localhost postgres[3932]: [5-1] LOG: redo starts at
0/1EAC3E68
Now as soon as I promoted the StandByA,
i see replication between A & B is broken, from StandBy B syslog, it shows
the following.
Jan 16 23:11:28 localhost postgres[3945]: [2-1] FATAL: timeline 15 of the
primary does not match recovery target timeline 14
Now my question is while A & B are in sync, why promoting B will break the
replication.
To resolve the problem, I need to do stop the engine on B, rsync from A,
and start back the B engine.
rsync -a --progress --exclude postgresql.conf --exclude recovery.done
--exclude pg_hba.conf root(at)10(dot)89(dot)94(dot)31:/opt/postgres/9.2/data/*
/opt/postgres/9.2/data
Do I need to sync the whole data directory from A? I have a small DB now (2
tables with only few rows). This may take a long time if I have a much
larger DB. Any shortcut? Why do i need to do the rync while A & B are
originally in sync?
Thanks~
Ning
From | Date | Subject | |
---|---|---|---|
Next Message | Stuart Bishop | 2013-01-17 08:18:09 | Re: plpython intermittent ImportErrors |
Previous Message | Kirk Wythers | 2013-01-17 05:15:56 | speeding up a join query that utilizes a view |