Re: BUG #7643: Issuing a shutdown request while server startup leads to server hang

From: Hari Babu <haribabu(dot)kommi(at)huawei(dot)com>
To: "'Tom Lane'" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #7643: Issuing a shutdown request while server startup leads to server hang
Date: 2012-11-20 04:31:59
Message-ID: 005201cdc6d7$fb988f00$f2c9ad00$@kommi@huawei.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

>haribabu(dot)kommi(at)huawei(dot)com writes:
> Problem Reproduction:
> 1. Add recovery.conf to the database directory.
> 2. Start the server
> 3. Issue the shutdown request
> and the shutdown request timing should be such that below server logs
should
> print.

> Log:

> ./postgres -D data -p 3335
> LOG: database system was shut down in recovery at 2012-11-08 19:42:42 IST
> LOG: entering standby mode
> LOG: received fast shutdown request
> LOG: consistent recovery state reached at 0/17D0700
> LOG: record with zero length at 0/17D0700

> Problem reproduced in 9.3 head.

>After further investigation, I can't reproduce this and I don't believe
>your patch fixes it. What happens when I try this is

>* postmaster gets SIGINT, sends SIGTERM to startup process

>* startup process exits with exit(1)

>* postmaster sees that as a startup crash and exits, per the first
>test in reaper()

>So the log trace I'm getting looks like

>LOG: received fast shutdown request
>LOG: startup process (PID 9772) exited with exit code 1
>LOG: aborting startup due to startup process failure

>Now, transitioning to PM_WAIT_BACKENDS state earlier, as your patch
>proposes, might make the log look a bit nicer because the logic in
>reaper() wouldn't think the exit was a "crash". But it's not going to
>have anything to do with whether the startup process exits on the signal
>or not. What seems to have happened for you is that the startup process
>ignored the SIGTERM signal, but it's not at all obvious why.

>We're going to need more details about how to reproduce this.
>I speculate it might have something to do with the specific
>restore_command you're using.

The problem occurs only when active server is restarting by just adding a
recovery.conf file to the data directory.
No need of specifying any restore command. or the standby server restart
also can lead to this problem.

The startup process sends "PMSIGNAL_RECOVERY_STARTED" to postmaster only
incase of "InArchiveRecovery" flag is enabled.
The SIGINT signal should reach postmaster before the
"PMSIGNAL_RECOVERY_STARTED" sent by the startup process.

with the following code change in the startupXlog function, the issue can
reproduce very easily.

if (InArchiveRecovery && IsUnderPostmaster)
{
PublishStartupProcessInformation();
SetForwardFsyncRequests();
kill (PostmasterPid, SIGINT);
SendPostmasterSignal(PMSIGNAL_RECOVERY_STARTED);
bgwriterLaunched = true;
}

Please let me know if I miss anything.

Regards,
Hari babu.

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message John R Pierce 2012-11-20 06:41:08 Re: BUG #7677: listen_address 'localhost' listens not to IPv6 ::1
Previous Message Craig Ringer 2012-11-20 01:23:54 Re: Prepared Statement Name Truncation