Re: RFC: pg_stat_logmsg

From: Joe Conway <mail(at)joeconway(dot)com>
To: Gurjeet Singh <gurjeet(at)singh(dot)im>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: RFC: pg_stat_logmsg
Date: 2023-09-13 19:30:45
Message-ID: cda17e6d-f625-b45e-dfb9-8d18ed09ecc5@joeconway.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 7/9/23 14:13, Joe Conway wrote:
> On 7/7/23 01:38, Gurjeet Singh wrote:
>>> In this case, the error message stored in pg_stat_logmsg is just '%s'.
>>> The filename and line number columns might help identify the actual
>>> error but it requires users to read the source code and cannot know
>>> the actual error message.
>>
>> I believe that the number of such error messages, the ones with very
>> little descriptive content, is very low in Postgres code. These kinds
>> of messages are not prevalent, and must be used when code hits obscure
>> conditions, like seen in your example above. These are the kinds of
>> errors that Joe is referring to as "should not happen". In these
>> cases, even if the error message was descriptive, the user will very
>> likely have to dive deep into code to find out the real cause.
>
> Agreed. As an example, after running `make installcheck`
>
> 8<-----------------
> select sum(count) from pg_stat_logmsg;
> sum
> ------
> 6005
> (1 row)
>
> select message,sum(count)
> from pg_stat_logmsg
> where length(message) < 30
> and elevel in ('ERROR','FATAL','PANIC')
> and message like '%\%s%' escape '\'
> group by message
> order by length(message);
> message | sum
> -------------------------------+-----
> %s | 107
> "%s" is a view | 9
> "%s" is a table | 4
> %s is a procedure | 1
> invalid size: "%s" | 13
> %s at or near "%s" | 131
> %s at end of input | 3
> ...
> 8<-----------------
>
> I only see three message formats there that are generic enough to be of
> concern (the first and last two shown -- FWIW I did not see any more of
> them as the fmt string gets longer)
>
> So out of 6005 log events, 241 fit this concern.
>
> Perhaps given the small number of message format strings affected, it
> would make sense to special case those, but I am not sure it is worth
> the effort, at least for version 1.

Attached is an update, this time as a patch against 17devel. It is not
quite complete, but getting close IMHO.

Changes:
--------
1. Now is integrated into contrib as a "Additional Supplied Extension"

2. Documentation added

3. I had a verbal conversation with Andres and he reminded me that the
original idea for this was to collect data across fleets of pg hosts so
we could look for how often "should never happen" errors actually
happen. As such, we need to think in terms of aggregating the info from
many hosts, potentially with many localized languages for the messages.
So I converted the "message" column back to an untranslated string, and
added a "translated_message" column which is localized.

An alternative approach might be to provide a separate function that
accepts the message string and returns the translation. Thoughts on that?

4. In the same vein, I added a pgversion column since the filename and
line number for the same log message could vary across major or even
minor releases.

Not done:
---------
1. The main thing lacking at this point is a regression test.

2. No special casing for message == "%s". I am still not convinced it is
worthwhile to do so.

Comments gratefully welcomed.

Thanks,

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
20230912-000.diff text/x-patch 82.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2023-09-13 19:32:46 Re: [PATCH] Add native windows on arm64 support
Previous Message Peter Eisentraut 2023-09-13 19:12:46 Re: [PATCH] Add native windows on arm64 support