From: | Mads(dot)Tandrup(at)schneider-electric(dot)com |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Subject: | Timeline switch problem with streaming replication with 3 nodes |
Date: | 2012-09-24 12:37:33 |
Message-ID: | OF80BBB332.B495F5C6-ONC1257A83.00430216-C1257A83.00455AFC@apcc.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Hi All
I've set up a 3 postgresql nodes 1 master and 2 slaves. They have been
configured for streaming replication with synchronous on. I've set up an
virtual IP that points to the current master node.
When I kill the master node. The slave that was synchronous gets promoted
to master and gets the shared virtual IP
But sometimes the other slave don't accept the switch and instead the log
on the slave says:
2012-09-24 10:45:06 GMT 4663 FATAL: replication terminated by primary
server
2012-09-24 10:45:06 GMT 4662 LOG: record with zero length at 0/200009E8
2012-09-24 10:45:06 GMT 10209 FATAL: could not connect to the primary
server: could not connect to server: Connection refused
Is the server running on host "10.216.73.60" and accepting
TCP/IP connections on port 5432?
2012-09-24 10:45:11 GMT 10272 FATAL: could not connect to the primary
server: FATAL: recovery is still in progress, can't accept WAL streaming
connections
2012-09-24 10:45:16 GMT 10326 FATAL: timeline 10 of the primary does not
match recovery target timeline 9
2012-09-24 10:45:21 GMT 10388 FATAL: timeline 10 of the primary does not
match recovery target timeline 9
2012-09-24 10:45:26 GMT 10451 FATAL: timeline 10 of the primary does not
match recovery target timeline 9
...
And it continues to repeat the last line.
The new master says:
2012-09-24 10:45:06 GMT 8394 FATAL: replication terminated by primary
server
2012-09-24 10:45:06 GMT 8393 LOG: record with zero length at 0/200009E8
2012-09-24 10:45:11 GMT 8393 LOG: trigger file
found: /tmp/postgresql_trigger
2012-09-24 10:45:11 GMT 8393 LOG: redo done at 0/20000990
2012-09-24 10:45:11 GMT 8393 LOG: last completed transaction was at log
time 2012-09-24 10:45:01.917175+00
2012-09-24 10:45:11 GMT 8393 LOG: selected new timeline ID: 10
2012-09-24 10:45:11 GMT 10741 [unknown] FATAL: recovery is still in
progress, can't accept WAL streaming connections
2012-09-24 10:45:12 GMT 8393 LOG: archive recovery complete
2012-09-24 10:45:12 GMT 8391 LOG: database system is ready to accept
connections
2012-09-24 10:45:12 GMT 10743 LOG: autovacuum launcher started
The recovery.conf is:
standby_mode = 'on'
primary_conninfo = 'host=10.216.73.60 port=5432 user=root password=onyx
application_name=10.216.73.195'
recovery_target_timeline = 'latest'
trigger_file = '/tmp/postgresql_trigger'
I've found a discussion
(http://archives.postgresql.org/pgsql-general/2011-12/msg00553.php) on a
similar issue a while back. They talk about sharing WAL files as the
solution. But I thought that the idea with streaming replication was that I
would not need a shared storage.
Is that the only solution or is there another way?
Best regards,
Mads
From | Date | Subject | |
---|---|---|---|
Next Message | Merlin Moncure | 2012-09-24 12:53:22 | Re: 9.1 vs 8.4 performance |
Previous Message | salah jubeh | 2012-09-24 10:47:47 | Re: 9.1 vs 8.4 performance |