From: | Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> |
---|---|
To: | Simon Riggs <simon(at)2ndQuadrant(dot)com> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Hot standby, recovery infra |
Date: | 2009-02-09 15:13:02 |
Message-ID: | 499047FE.9090407@enterprisedb.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Simon Riggs wrote:
> On Fri, 2009-02-06 at 10:06 +0200, Heikki Linnakangas wrote:
>> Simon Riggs wrote:
>>> On Thu, 2009-02-05 at 21:54 +0200, Heikki Linnakangas wrote:
>>>> - If you perform a fast shutdown while startup process is waiting for
>>>> the restore command, startup process sometimes throws a FATAL error
>>>> which leads escalates into an immediate shutdown. That leads to
>>>> different messages in the logs, and skipping of the shutdown
>>>> restartpoint that we now otherwise perform.
>>> Sometimes?
>> I think what happens is that if the restore command receives the SIGTERM
>> and dies before the startup process that's waiting for the restore
>> command receives the SIGTERM, the startup process throws a FATAL error
>> because the restore command died unexpectedly. I put this
>>
>>> if (shutdown_requested && InRedo)
>>> {
>>> /* XXX: Is EndRecPtr always the right value here? */
>>> UpdateMinRecoveryPoint(EndRecPtr);
>>> proc_exit(0);
>>> }
>> right after the "system(xlogRestoreCmd)" call, to exit gracefully if we
>> were requested to shut down while restore command was running, but it
>> seems that that's not enough because of the race condition.
>
> Can we trap the death of the restorecmd and handle it differently from
> the death of the startup process?
The startup process launches the restore command, so it's the startup
process that needs to handle its death.
Anyway, I think I've found a solution. While we're executing the restore
command, we're in a state that it's safe to proc_exit(0). We can set a
flag to indicate to the signal handler when we're executing the restore
command, so that the signal handler can do proc_exit(0) on SIGTERM. So
if the startup process receives the SIGTERM first, it will proc_exit(0)
immediately, and if the restore command dies first due to the SIGTERM,
startup process exits with proc_exit(0) when it sees that restore
command exited because of the SIGTERM. If either process receives
SIGTERM for some other reason than a fast shutdown request, postmaster
will see that the startup process exited unexpectedly, and handles that
like a child process crash.
Attached is an updated patch that does that, and I've fixed all the
other outstanding issues I listed earlier as well. Now I'm feeling again
that this is in pretty good shape.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
Attachment | Content-Type | Size |
---|---|---|
recovery-infra-2ffabdc.patch | text/x-diff | 68.9 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Andrew Dunstan | 2009-02-09 15:44:17 | Re: WIP: fix SET WITHOUT OIDS, add SET WITH OIDS |
Previous Message | Mihai Criveti | 2009-02-09 14:41:52 | Re: 64 bit PostgreSQL 8.3.6 build on AIX 5300-09-02-0849 with IBM XL C/C++ 10.1.0.1 - initdb fails (could not dump unrecognized node type: 650) |