Re: SIGTERM -> elog(FATAL) -> proc_exit() is probably a bad idea

From: Hiroshi Inoue <Inoue(at)tpf(dot)co(dot)jp>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Mikheev, Vadim" <vmikheev(at)SECTORBASE(dot)COM>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: SIGTERM -> elog(FATAL) -> proc_exit() is probably a bad idea
Date: 2001-01-15 02:57:06
Message-ID: 3A626702.7DD48F11@tpf.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane wrote:
>
> Hiroshi Inoue <Inoue(at)tpf(dot)co(dot)jp> writes:
> >>>> I've thought that the main purpose of CRIT_SECTION is to
> >>>> force redo recovery for any errors during the CRIT_SECTION
> >>>> to complete the critical operation e.g. bt_split().
> >>
> >> How could it force redo?
>
> > Doesn't proc_exit(non-zero) force shuttdown recovery ?
>
> It forces a shutdown and restart, but that does not do anything good
> that I can see. The WAL log entry hasn't been made, typically, so there
> is nothing to redo. If there *were* a log entry, and the redo failed
> again (pretty likely), then we'd have an infinite crash/try to
> restart/crash cycle, which is just about the worst possible behavior.
> So I'm not seeing what the point is.
>

It seems a nature of 7.1 recovery scheme.
Once a WAL log entry is made, recovery should
complete the log in regardless of the cause of
recovery(elog, system error like SEGV etc).

I've wondered why no one has asked how we could
recover from a recovery failure. Unfortunately,
I don't know the answer. Recovery failure seems
veeeeery serious because postmaster couldn't
start if the startup recovery fails.
In addtion I have another anxiety. I don't know
how robust WAL is against general bugs not
directly related to WAL.

Regards.
Hiroshi Inoue

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tatsuo Ishii 2001-01-15 03:54:36 Re: copy from stdin; bug?
Previous Message Rehak Tamas 2001-01-15 02:37:27 Re: copy from stdin; bug?