Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Marti Raudsepp <marti(at)juffo(dot)org> writes:
>>> This patch adds the backend's current running query to the
>>> "backend crash" message.
>>
>> Sorry, this patch is entirely unacceptable. We cannot have the
>> postmaster's functioning depending on the contents of shared
>> memory still being valid ... most especially not when we know
>> that somebody just crashed, and could have corrupted the shared
>> memory in arbitrary ways. No, I don't think your attempts to
>> validate the data are adequate, nor do I believe they can be made
>> adequate.
>
> Why and why not?
>
>> And I doubt
>> that the goal is worth taking risks for.
>
> I am unable to count the number of times that I have had a
> customer come to me and say "well, the backend crashed". And I go
> look at their logs and I have no idea what happened. So then I
> tell them to include %p in log_line_prefix and set
> log_min_duration_statement=0 and call me if it happens again.
> This is a huge nuisance and a serious interference with attempts
> to do meaningful troubleshooting. When it doesn't add days or
> weeks to the time to resolution, it's because it prevents
> resolution altogether. We really, really need something like
> this.
I haven't had this experience more than a few times, but a few is
enough to recognize how painful it can be. It seems we're brave
enough to log *some* information at crash time, in spite of the risk
that memory may be corrupted in unpredictable ways. Sure, there is
a slim chance that when you think you're writing to the log you've
actually got a handle to a segment of a heap file, but that chance
is extremely slim -- and if that's where you're at you've probably
already written a 'segfault' message there, anyway. My gut feel is
this would allow diagnosis in a timely fashion often enough to save
more data than it puts at risk, to say nothing of people's time.
I don't know whether the patch on the table is coded as defensively
as it should be given the perilous times the new code would come
into play, but I don't think the idea should be rejected out of
hand.
-Kevin