Re: [ADMIN] Streaming Replication Server Crash

From: Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: raghu ram <raghuchennuru(at)gmail(dot)com>, pgsql-admin(at)postgresql(dot)org, pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: [ADMIN] Streaming Replication Server Crash
Date: 2012-10-23 05:58:15
Message-ID: 508631F7.9040607@ringerc.id.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin pgsql-general

On 10/23/2012 01:20 PM, Tom Lane wrote:
>
> This isn't the first time I've wondered exactly which signal was meant
> in a postmaster child-crash report. Seems like it might be worth
> expending some code on a symbolic translation, instead of just printing
> the number. That'd be easy enough (for common signal names) on Unix,
> but has anyone got a suggestion how we might do something useful on
> Windows?

Here's a typical Windows exception:

2012-10-04 14:29:08 CEST LOG: server process (PID 1416) was terminated
by exception 0xC0000005

2012-10-04 14:29:08 CEST HINT: See C include file "ntstatus.h" for a
description of the hexadecimal value.

These codes can be translated with FormatMessage:


http://msdn.microsoft.com/en-us/library/windows/desktop/ms679351(v=vs.85).aspx
<http://msdn.microsoft.com/en-us/library/windows/desktop/ms679351%28v=vs.85%29.aspx>
http://support.microsoft.com/kb/259693

FormatMessage may not be safe to perform in the context of a munged heap
or some other failure conditions, so you probably don't want to do it
from a crash handler. It is safe for the postmaster to do it based on
the exception code it gets from the dying backend, though.

I'd say the best option is for the postmaster to print the
FormatMessage(
FORMAT_MESSAGE_ALLOCATE_BUFFER|FORMAT_MESSAGE_FROM_SYSTEM|FORMAT_MESSAGE_FROM_HMODULE,
...) output when it sees the exception code from the dying backend.

RtlNtStatusToDosError may also be of interest:
http://msdn.microsoft.com/en-us/library/windows/desktop/ms680600(v=vs.85).aspx
<http://msdn.microsoft.com/en-us/library/windows/desktop/ms680600%28v=vs.85%29.aspx>
... but it's in Winternl.h so it's not guaranteed to exist / be
compatible between versions and can only be accessed via runtime dynamic
linking. Not ideal.

--
Craig Ringer

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message Myers Brian D 2012-10-23 18:06:07 Re: [GENERAL] Streaming Replication Server Crash
Previous Message Tom Lane 2012-10-23 05:20:31 Re: [ADMIN] Streaming Replication Server Crash

Browse pgsql-general by date

  From Date Subject
Next Message chinnaobi 2012-10-23 10:01:59 Re: Streaming replication failed to start scenarios
Previous Message Tom Lane 2012-10-23 05:20:31 Re: [ADMIN] Streaming Replication Server Crash