From: | Jeff Janes <jeff(dot)janes(at)gmail(dot)com> |
---|---|
To: | James Sewell <james(dot)sewell(at)jirotech(dot)com> |
Cc: | pgsql-general <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: Critical failure of standby |
Date: | 2016-08-15 16:09:46 |
Message-ID: | CAMkU=1w388oBdz4PATSzCuzaND3C0JLPw2qv1pQjtPN729FnVw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Thu, Aug 11, 2016 at 10:39 PM, James Sewell <james(dot)sewell(at)jirotech(dot)com>
wrote:
> Hello,
>
> We recently experienced a critical failure when failing to a DR
> environment.
>
> This is in the following environment:
>
>
> - 3 x PostgreSQL machines in Prod in a sync replication cluster
> - 3 x PostgreSQL machines in DR, with a single machine async and the
> other two cascading from the first machine.
>
> There was network failure which isolated Production from everything else,
> Production has no errors during this time (and has now come back OK).
>
> DR did not tolerate the break, the following appeared in the logs and none
> of them can start postgres. There were no queries coming into DR at the
> time of the break.
>
> Please note that the "Host Key verification failed" messages are due to
> the scp command not functioning. This means restore_command is not working
> to restore from the XLOG archive, but should not effect anything else.
>
In my experience, PostgreSQL issues its own error messages when
restore_command fails. So I see both the error from the command itself,
and an error from PostgreSQL. Why don't you see that? Is the
restore_command failing, but then reporting that it succeeded?
And if you can't get files from the XLOG archive, why do you think that
that is OK?
Cheers,
Jeff
From | Date | Subject | |
---|---|---|---|
Next Message | Jeff Janes | 2016-08-15 17:20:56 | Re: RowExclusiveLock timeout while autovacuum |
Previous Message | Adrian Klaver | 2016-08-15 15:45:20 | Re: 9.2 to 9.5 pg_upgrade losing data |