Re: Log file monitoring and event notification

From: "Antman, Jason (CMG-Atlanta)" <Jason(dot)Antman(at)coxinc(dot)com>
To: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Log file monitoring and event notification
Date: 2014-04-05 19:17:49
Message-ID: 534056CC.6090009@coxinc.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

General thought:

It's entirely possible my current Postgres environment is missing
something (I'm an automation engineer, not a DBA - most of my postgres
knowledge has been learned on the job or from Google), but we actively
monitor the receive and replay lag (i.e. comparing
pg_current_xlog_location() on the master to
pg_last_xlog_receive_location() and pg_last_xlog_replay_location() on
the slaves) and alert off of that. We don't use any logs for replication
alerts.

We *do*, however, monitor postgres logs for other things. We use Nagios
(actually Icinga) as our monitoring system, and there's a nice Perl
plugin available online called check_logfiles
(http://exchange.nagios.org/directory/Plugins/Log-Files/check_logfiles/details)
that handles alerting on regular expressions in a log file, and also
very nicely handles file rotation (even compression), and is highly
configurable (including perl hook scripts to run if a match is found).

In the easiest case (like if you're not using a real monitoring system),
you could just configure this script, run it however you want (cron?)
and if it exits non-zero, mail the output.

In terms of embedding things in Postgres, I'm a staunch believer that
for performance and reliability, something like alerting shouldn't be
embedded in the application itself but should be handled by an external
(and easily replace-able) component. It's easy enough to do with
logging_collector, or to do with syslog (AFAIK the worry about not
capturing everything is only if you're shipping syslog over the network,
not if you're running a syslogd on the same host as postgres and writing
the logs locally).

From a systems management/monitoring standpoint, I'd much rather see
something in postgres that sends detailed, well-structured log messages
to a message queue than put the alerting logic in it (syslog works with
everything, but it's so horribly obsolete).

-Jason

On 04/05/14 11:47, Andy Colson wrote:
> Hi All.
>
> I've started using replication, and I'd like to monitor my logs for
> any errors or problems. I don't want to do it manually, and I'm not
> interested in stats (a la PgBadger).
>
> What I'd like, is the instant PG logs: "FATAL: wal segment already
> removed" (or some such bad thing), I'd like to get an email.
>
> 1st: is anyone using a program that does something like this? What do
> you use? How do you like it?
>
> My thinking has been along these lines:
>
> + log to syslog doesnt really help, and I recall seeing somewhere
> "syslog may not capture everything". I still have monitoring and log
> rotation problems.
>
> + log to stderr and write my own collector works, but then I have to
> duplicate what logging_collector already does (rotating, truncating,
> age, size, etc). Too much work.
>
> + log with logging_collector, then write a thing to figure out what
> file its writing to and tail it, watch for rotation, etc. This is just
> messy.
>
> If there isn't a program already available (which I've searched for,
> believe me), I'd like to get feedback on extending logging_collector
> with some lua scriptable event notification.
>
> Lua is small, fast, and mostly easy to embed. It would allow an admin
> to customize whatever kind of monitoring they want. When an event
> matches logging_collector would spawn off a different app to handle
> the event notification. The app would be launched in the background
> and forgotten about so that logging isn't delayed.
>
> I'm thinking:
>
> function checkLine(item)
> if item:find('FATAL') then
> launch('/usr/bin/mynotify.pl', item)
> end
> end
>
> Logging_collector would then do something like (forgive the perl
> pseudo code):
>
> ... regular log file rotation stuff ..
> open OUT
> while ($line = <stderr>)
> {
> checkLine($line);
> print OUT $line;
> }
>
> ... etc, etc ...
>
> Lua could also have another handy events defined:
> OnLogRotate(newFile)
> OnStartup()
> OnShutdown()
>
>
> Lua can also keep state, so maybe you dont want to email on the first
> FATAL, but on the third.
>
> local cc = 0
> function checkLine(item)
> if item:find('FATAL') then
> cc = cc + 1
> if cc > 2 then
> launch('/usr/bin/mynotify.pl', item)
> cc = 0
> end
> end
> end
>
> Thoughts?
>
> -Andy
>
>

--

Jason Antman | Systems Engineer | CMGdigital
jason(dot)antman(at)coxinc(dot)com | p: 678-645-4155

In response to

Browse pgsql-general by date

  From Date Subject
Next Message John R Pierce 2014-04-05 21:35:48 Re: SSD Drives
Previous Message Andy Colson 2014-04-05 15:47:14 Log file monitoring and event notification