From: | marin(at)kset(dot)org |
---|---|
To: | Martín Marqués <martin(at)2ndquadrant(dot)com> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: Slave promotion problem... |
Date: | 2015-08-31 14:05:26 |
Message-ID: | 02c9f0b656add4fae12ec2453fdc6b84@kset.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On 2015-08-31 14:38, Martín Marqués wrote:
> El 31/08/15 a las 03:29, marin(at)kset(dot)org escribió:
>> Last week we had some problems on the master server which caused a
>> failover on the slave (the master was completely unresponsive due to
>> reasons still unknown). The slave received the promote signal (pg_ctl
>> promote) and logged that in the logs:
>> 2015-08-28 23:05:10 UTC [6]: [50-1] user=,db= LOG: received promote
>> request
>> 2015-08-28 23:05:10 UTC [467]: [2-1] user=,db= FATAL: terminating
>> walreceiver process due to administrator command
>>
>> 5 hours later the slave still didn't promote. Meanwhile we fixed the
>> master and restarted it. The slave was restarted and it behaved just
>> like the promote signal didn't arrive, connecting to the master as a
>> regular slave.
>
> Aren't there any further logs after the walreceiver termination?
> Up to here everything looks fine, but we have no idea on what was
> logged
> afterwards.
There are logs (quite a few, cca. 5 hours of it), every second something
like this:
2015-08-28 23:05:12 UTC [79867]: [1-1] user=[unknown],db=[unknown] LOG:
connection received: host=[local]
2015-08-28 23:05:12 UTC [79867]: [2-1] user=postgres,db=postgres LOG:
connection authorized: user=postgres database=postgres
This logs the connection of the process that probes the server is alive.
I was expecting to see something like:
redo done at xxxxx
last completed transaction was at log time xxxxxxx
But those lines didn't appear after 5 hours. As I understand, these are
written before the server uses the restore_command to get WAL and
history files from the archive.
>
>> I am unsure if this promotion failure is a bug/glitch, but the promote
>> procedure is automated and tested a couple of hundred times so I am
>> certain we initiated the promote correctly.
>
> Are you using homemade scripts? Maybe you need to test them more
> thoroughly, with different environment parameters.
We use a custom script for the restore_command, but is seems that it was
not invoked.
Regards,
Mladen Marinović
From | Date | Subject | |
---|---|---|---|
Next Message | Ray Stell | 2015-08-31 14:24:26 | bdr download |
Previous Message | Melvin Davidson | 2015-08-31 13:16:34 | Re: PostgreSQL Developer Best Practices |