Re: BUG #17327: Postgres server does not correctly emit error for max_slot_wal_keep_size being breached

From: Alex Enachioaie <alex(at)altmetric(dot)com>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #17327: Postgres server does not correctly emit error for max_slot_wal_keep_size being breached
Date: 2021-12-13 09:45:51
Message-ID: FGS14R.K3L8GOS0U6ZQ1@altmetric.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hello Kyotaro,

Understood, that makes sense re: invalidation and what I assumed might
be happening.

I think I'm happy to leave the method of resolution up to you, I think
the main point for me would be that when a
replication process gets terminated as a consequence of the underlying
temporary replication slot reaching max_slot_wal_keep_size
that we log a specific message to indicate to the user the cause of the
termination rather than leave it ambiguous.

Thank you

King regards

Alex E
Senior Site Reliability Engineer
Altmetric

On Mon, Dec 13 2021 at 14:44:42 +0900, Kyotaro Horiguchi
<horikyota(dot)ntt(at)gmail(dot)com> wrote:
> At Fri, 10 Dec 2021 15:46:11 +0000, Alex Enachioaie
> <alex(at)altmetric(dot)com <mailto:alex(at)altmetric(dot)com>> wrote in
>> So, essentially the server side log emmitted on a temporary
>> replication breaching max_slot_wal_keep_size limit is only:
>>
>> 2021-12-03 16:21:54 UTC [29724-2647] LOG: terminating process 42601
>> to
>> release replication slot "pg_basebackup_42601"
>>
>> whereas for a persistent replication slot we get an additional line
>> that clearly states _why_ the replication process was terminated:
>>
>> 2021-12-03 00:57:16 UTC [29724-2645] LOG: terminating process 3899
>> to
>> release replication slot "backup"
>> 2021-12-03 00:57:16 UTC [29724-2646] LOG: invalidating slot "backup"
>> because its restart_lsn 47198/1E000000 exceeds
>> max_slot_wal_keep_size
>>
>> I'm not sure if this means that in the case of a temporary slot it
>> does not get invalidated at all (I've not looked at the code), or
>> it's
>> simply that we don't emit a log message when it does because the
>> slot
>> would be discarded anyway, but such a message would be very useful
>> for
>> diagnostic purposes imo.
>
> The "invalidating slot" message is emitted when the slot needs to be
> invalidated, that is, when the slot persists after the user process is
> terminated. Thus the message cannot be seen for temporary slots since
> they are removed at process termination and no longer exist after
> that.
>
> At Wed, 08 Dec 2021 11:23:35 +0000, PG Bug reporting form
> <noreply(at)postgresql(dot)org <mailto:noreply(at)postgresql(dot)org>> wrote in
>> The core issue here then in our opinion is that Postgres server
>> should log
>> an error when the max_slot_wal_keep_size limit is reached for
>> temporary
>> replication slots as well as for permanent ones as otherwise
>> users/administrators are presented only with non-descript connection
>> termination errors which do not point to the actual cause of the
>> problem.
>
> If you mean the "invalidating slot" message by "an error", that
> wouldn't happen since invalidation is actually doesn't happen. Or, we
> could change the message like this. Does this make sense for you?
>
>> LOG: terminating process 42601 to release temporary replication
>> slot "pg_basebackup_42601"
>> DETAIL: The slot will be dropped by the process termination.
>
>
>> LOG: terminating process 3899 to release persistent replication
>> slot "backup"
> ...
>> LOG: invalidating slot "backup" because its restart_lsn
>> 47198/1E000000 exceeds max_slot_wal_keep_size
>
> regards.
>
> --
> Kyotaro Horiguchi
> NTT Open Source Software Center

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Michael Paquier 2021-12-13 11:10:57 Re: BUG #17326: Postgres crashed when pg_reload_conf() with ssl certificate parameters
Previous Message Kyotaro Horiguchi 2021-12-13 07:33:05 Re: BUG #17334: Assert failed inside computeDistance() on gist index scanning