From: | "Antman, Jason (CMG-Atlanta)" <Jason(dot)Antman(at)coxinc(dot)com> |
---|---|
To: | "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: Log file monitoring and event notification |
Date: | 2014-04-05 19:17:49 |
Message-ID: | 534056CC.6090009@coxinc.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
General thought:
It's entirely possible my current Postgres environment is missing
something (I'm an automation engineer, not a DBA - most of my postgres
knowledge has been learned on the job or from Google), but we actively
monitor the receive and replay lag (i.e. comparing
pg_current_xlog_location() on the master to
pg_last_xlog_receive_location() and pg_last_xlog_replay_location() on
the slaves) and alert off of that. We don't use any logs for replication
alerts.
We *do*, however, monitor postgres logs for other things. We use Nagios
(actually Icinga) as our monitoring system, and there's a nice Perl
plugin available online called check_logfiles
(http://exchange.nagios.org/directory/Plugins/Log-Files/check_logfiles/details)
that handles alerting on regular expressions in a log file, and also
very nicely handles file rotation (even compression), and is highly
configurable (including perl hook scripts to run if a match is found).
In the easiest case (like if you're not using a real monitoring system),
you could just configure this script, run it however you want (cron?)
and if it exits non-zero, mail the output.
In terms of embedding things in Postgres, I'm a staunch believer that
for performance and reliability, something like alerting shouldn't be
embedded in the application itself but should be handled by an external
(and easily replace-able) component. It's easy enough to do with
logging_collector, or to do with syslog (AFAIK the worry about not
capturing everything is only if you're shipping syslog over the network,
not if you're running a syslogd on the same host as postgres and writing
the logs locally).
From a systems management/monitoring standpoint, I'd much rather see
something in postgres that sends detailed, well-structured log messages
to a message queue than put the alerting logic in it (syslog works with
everything, but it's so horribly obsolete).
-Jason
On 04/05/14 11:47, Andy Colson wrote:
> Hi All.
>
> I've started using replication, and I'd like to monitor my logs for
> any errors or problems. I don't want to do it manually, and I'm not
> interested in stats (a la PgBadger).
>
> What I'd like, is the instant PG logs: "FATAL: wal segment already
> removed" (or some such bad thing), I'd like to get an email.
>
> 1st: is anyone using a program that does something like this? What do
> you use? How do you like it?
>
> My thinking has been along these lines:
>
> + log to syslog doesnt really help, and I recall seeing somewhere
> "syslog may not capture everything". I still have monitoring and log
> rotation problems.
>
> + log to stderr and write my own collector works, but then I have to
> duplicate what logging_collector already does (rotating, truncating,
> age, size, etc). Too much work.
>
> + log with logging_collector, then write a thing to figure out what
> file its writing to and tail it, watch for rotation, etc. This is just
> messy.
>
> If there isn't a program already available (which I've searched for,
> believe me), I'd like to get feedback on extending logging_collector
> with some lua scriptable event notification.
>
> Lua is small, fast, and mostly easy to embed. It would allow an admin
> to customize whatever kind of monitoring they want. When an event
> matches logging_collector would spawn off a different app to handle
> the event notification. The app would be launched in the background
> and forgotten about so that logging isn't delayed.
>
> I'm thinking:
>
> function checkLine(item)
> if item:find('FATAL') then
> launch('/usr/bin/mynotify.pl', item)
> end
> end
>
> Logging_collector would then do something like (forgive the perl
> pseudo code):
>
> ... regular log file rotation stuff ..
> open OUT
> while ($line = <stderr>)
> {
> checkLine($line);
> print OUT $line;
> }
>
> ... etc, etc ...
>
> Lua could also have another handy events defined:
> OnLogRotate(newFile)
> OnStartup()
> OnShutdown()
>
>
> Lua can also keep state, so maybe you dont want to email on the first
> FATAL, but on the third.
>
> local cc = 0
> function checkLine(item)
> if item:find('FATAL') then
> cc = cc + 1
> if cc > 2 then
> launch('/usr/bin/mynotify.pl', item)
> cc = 0
> end
> end
> end
>
> Thoughts?
>
> -Andy
>
>
--
Jason Antman | Systems Engineer | CMGdigital
jason(dot)antman(at)coxinc(dot)com | p: 678-645-4155
From | Date | Subject | |
---|---|---|---|
Next Message | John R Pierce | 2014-04-05 21:35:48 | Re: SSD Drives |
Previous Message | Andy Colson | 2014-04-05 15:47:14 | Log file monitoring and event notification |