Re: Send duration output to separate log files

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Greg Sabino Mullane <htamfids(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Send duration output to separate log files
Date: 2024-07-24 13:45:19
Message-ID: CAFj8pRBYCsbP9qmFZzm3B5viX7=FO_OdCjYU+gf2YP+r50rsbQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi

st 10. 7. 2024 v 17:58 odesílatel Greg Sabino Mullane <htamfids(at)gmail(dot)com>
napsal:

> Please find attached a patch to allow for durations to optionally be sent
> to separate log files. In other words, rather than cluttering up our
> postgres202007.log file with tons of output from
> log_min_duration_statement, duration lines are sent instead to the file
> postgres202007.duration.
>
> Over the years, durations have been the number one cause of very large log
> files, in which the more "important" items get buried in the noise. Also,
> programs that are scanning for durations typically do not care about the
> normal, non-duration output. Some people have a policy of logging
> everything, which in effect means setting log_min_duration_statement to 0,
> which in turn makes your log files nearly worthless for spotting
> non-duration items. This feature will also be very useful for those who
> need to temporarily turn on log_min_duration_statement, for some quick
> auditing of exactly what is being run on their database. When done, you can
> move or remove the duration file without messing up your existing log
> stream.
>
> This only covers the case when both the duration and statement are set on
> the same line. In other words, log_min_duration_statement output, but not
> log_duration (which is best avoided anyway). It also requires
> logging_collector to be on, obviously.
>
> Details:
>
> The edata structure is expanded to have a new message_type, with a
> matching function errmessagetype() created.
> [include/utils/elog.h]
> [backend/utils/elog.c]
>
> Any errors that have both a duration and a statement are marked via
> errmessagetype()
> [backend/tcop/postgres.c]
>
> A new GUC named "log_duration_destination" is created, which supports any
> combination of stderr, csvlog, and jsonlog. It does not need to match
> log_destination, in order to support different use cases. For example, the
> user wants durations sent to a CSV file for processing by some other tool,
> but still wants everything else going to a normal text log file.
>
> Code: [include/utils/guc_hooks.h] [backend/utils/misc/guc_tables.c]
> Docs: [sgml/config.sgml] [backend/utils/misc/postgresql.conf.sample]
>
> Create a new flag called PIPE_PROTO_DEST_DURATION
> [include/postmaster/syslogger.h]
>
> Create new flags:
> LOG_DESTINATION_DURATION,
> LOG_DESTINATION_DURATION_CSV,
> LOG_DESTINATION_DURATION_JSON
> [include/utils/elog.h]
>
> Routing and mapping LOG_DESTINATION to PIPE_PROTO
> [backend/utils/error/elog.c]
>
> Minor rerouting when using alternate forms
> [backend/utils/error/csvlog.c]
> [backend/utils/error/jsonlog.c]
>
> Create new filehandles, do log rotation, map PIPE_PROTO to
> LOG_DESTINATION. Rotation and entry into the "current_logfiles" file are
> the same as existing log files. The new names/suffixes are duration,
> duration.csv, and duration.json.
> [backend/postmaster/syslogger.c]
>
> Testing to ensure combinations of log_destination and
> log_duration_destination work as intended
> [bin/pg_ctl/meson.build]
> [bin/pg_ctl/t/005_log_duration_destination.pl]
>
> Questions I've asked along the way, and perhaps other might as well:
>
> What about logging other things? Why just duration?
>
> Duration logging is a very unique event in our logs. There is nothing
> quite like it - it's always client-driven, yet automatically generated. And
> it can be extraordinarily verbose. Removing it from the existing logging
> stream has no particular downsides. Almost any other class of log message
> would likely meet resistance as far as moving it to a separate log file,
> with good reason.
>
> Why not build a more generic log filtering case?
>
> I looked into this, but it would be a large undertaking, given the way our
> logging system works. And as per above, I don't think the pain would be
> worth it, as duration covers 99% of the use cases for separate logs.
> Certainly, nothing else other than a recurring ERROR from the client can
> cause massive bloat in the size of the files. (There is a nearby patch to
> exclude certain errors from the log file as a way to mitigate the error
> spam - I don't like that idea, but should mention it here as another effort
> to keep the log files a manageable size)
>
> Why not use an extension for this?
>
> I did start this as an extension, but it only goes so far. We can use
> emit_log_hook, but it requires careful coordination of open filehandles,
> has to do inefficient regex of every log message, and cannot do things like
> log rotation.
>
> Why not bitmap PIPE_PROTO *and* LOG_DESTINATION?
>
> I tried to do both as simple bitmaps (i.e. duration+csv = duration.csv),
> and not have to use e.g. LOG_DESTIATION_DURATION_CSV, but size_rotation_for
> ruined it for me. Since our PIPE always sends one thing at a time, a single
> new flag enables it to stay as a clean bits8 type.
>
> What about Windows?
>
> Untested. I don't have access to a Windows build, but I think in theory it
> should work fine.
>

I like the proposed feature, but I miss two points.

1. possibility to support auto_explain

2. possibility to support rsyslog by setting different or some syslog
related redirection by setting different facility.

Regards

Pavel

> Cheers,
> Greg
>
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2024-07-24 14:05:45 Re: warning: dereferencing type-punned pointer
Previous Message Peter Eisentraut 2024-07-24 13:32:38 Re: improve ssl error code, 2147483650