From: | Josh Berkus <josh(at)agliodbs(dot)com> |
---|---|
To: | pgsql-hackers(at)postgreSQL(dot)org |
Subject: | Standby promotion does not work |
Date: | 2011-04-10 20:48:09 |
Message-ID: | 4DA21789.2090403@agliodbs.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
All,
So I've finally been able to do some testing, and I'll report that
currently there is way I've found to get existing standbys to subscribe
to a new master.
No matter what I do in recovery.conf, it results in errors and failure
to replicate.
Test setup:
hosts: master1, master2, replica1
replica1 and master2 are subscribed to master1
First, master1 is shut down.
Second, master 2 is promoted via "pg_ctl promote"
So, original recovery.conf on replica1:
#autogenerated recovery.conf file. do not edit
standby_mode = 'on'
primary_conninfo = 'host=master1 port=5432 user=replication'
trigger_file = '/var/log/pgpool/trigger/trigger_file1'
restore_command = 'scp master1:/usr/local/pgsql/wal_share/%f %p'
recovery_target_timeline = 'latest'
This is changed to:
#autogenerated recovery.conf file. do not edit
standby_mode = 'on'
primary_conninfo = 'host=master1 port=5432 user=replication'
trigger_file = '/var/log/pgpool/trigger/trigger_file1'
restore_command = 'scp master1:/usr/local/pgsql/wal_share/%f %p'
recovery_target_timeline = 'latest'
On restart of replica1, I get the following error:
2011-04-10 13:27:24.766 PDT,,,2867,,4da212ac.b33,1,,2011-04-10 13:27:24
PDT,,0,FATAL,XX000,"timeline 2 of the primary does not match recovery
target timeline 1",,,,,,,,,""
2011-04-10 13:27:29.875 PDT,,,2878,,4da212b1.b3e,1,,2011-04-10 13:27:29
PDT,,0,FATAL,XX000,"timeline 2 of the primary does not match recovery
target timeline 1",,,,,,,,,""
If I try to manually change the timeline in recovery.conf to '2', I get:
2011-04-10 13:23:05.115 PDT,,,2834,,4da211a9.b12,2,,2011-04-10 13:23:05
PDT,,0,FATAL,XX000,"recovery target timeline 2 does not exist",,,,,,,,,""
2011-04-10 13:23:05.116 PDT,,,2832,,4da211a8.b10,1,,2011-04-10 13:23:04
PDT,,0,LOG,00000,"startup process (PID 2834) exited with exit code
1",,,,,,,,,""
2011-04-10 13:23:05.116 PDT,,,2832,,4da211a8.b10,2,,2011-04-10 13:23:04
PDT,,0,LOG,00000,"aborting startup due to startup process
failure",,,,,,,,,""
Receive location on master2:
0/93000078
Receive location on replica1:
0/93000000
... and in any case, this is a test system with no activity. So there's
no way we can replica1 be ahead.
So it seems like we still don't have any way to promote an existing
standby to a new master. Is this fixable?
--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com
From | Date | Subject | |
---|---|---|---|
Next Message | Noah Misch | 2011-04-10 21:38:49 | Re: BUG #5856: pg_attribute.attinhcount is not correct. |
Previous Message | Andrew Dunstan | 2011-04-10 18:53:39 | pgsql: Don't make "replication" magical as a user name, only as a datab |