Re: warning message in standby

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: warning message in standby
Date: 2010-06-14 10:21:41
Message-ID: AANLkTikTYTIEfEO01phjM0CnGFTPexdS-KRfAMYdeFa_@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jun 14, 2010 at 12:16, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> Heikki Linnakangas wrote:
>> On 12/06/10 04:19, Bruce Momjian wrote:
>> > Robert Haas wrote:
>> >>> If my streaming replication stops working, I want to know about it as
>> >>> soon as possible. WARNING just doesn't cut it.
>> >>>
>> >>> This needs some better thought.
>> >>>
>> >>> If we PANIC, then surely it will PANIC again when we restart unless we
>> >>> do something. So we can't do that. But we need to do something better
>> >>> than
>> >>>
>> >>> WARNING there is a bug that will likely cause major data loss
>> >>> HINT you'll be sacked if you miss this message
>> >>
>> >> +1.  I was making this same argument (less eloquently) upthread.
>> >> I particularly like the errhint().
>> >
>> > I am wondering what action would be most likely to get the
>> > administrator's attention.
>>
>> I've committed the patch to disconnect the SR connection in that case.
>> If the message needs improvement, let's do that separately once we
>> figure out what to do.
>>
>> Seems like we need something like WARNING that doesn't cause the process
>> to die, but more alarming like ERROR/FATAL/PANIC. Or maybe just adding a
>> hint to the warning will do. How about
>>
>> WARNING:  invalid record length at 0/4005330
>> HINT: An invalid record was streamed from master. That can be a sign of
>> corruption in the master, or inconsistency between master and standby
>> state. The record will be re-fetched, but that is unlikely to fix the
>> problem. You may have to restore standby from base backup.
>
> I am thinking about log monitoring tools like Nagios.  I am afraid
> they are never going to pick up something tagged WARNING, no matter

If they are properly configured, I imagine they would. And if they're
not, well, there's not much for us to do.

(What would be more usful then would be to separate "user-warnings"
like warnings about cast from actual system-warnings like this, but
that's a whole different story)

> what the wording is.  Crazy idea, but can we force a fatal error line
> into the logs with something like "WARNING ...\nFATAL: ...".

That's way too crazy :P And btw, randomly sticking newlines into that
will mess up *most* log displayers and I bet a lot of the log
monitoring tools as well...

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dimitri Fontaine 2010-06-14 10:39:32 pg_archive_bypass
Previous Message Dimitri Fontaine 2010-06-14 10:21:35 Re: Command to prune archive at restartpoints