From: | Alex Enachioaie <alex(at)altmetric(dot)com> |
---|---|
To: | Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> |
Cc: | pgsql-bugs(at)lists(dot)postgresql(dot)org |
Subject: | Re: BUG #17327: Postgres server does not correctly emit error for max_slot_wal_keep_size being breached |
Date: | 2021-12-13 09:45:51 |
Message-ID: | FGS14R.K3L8GOS0U6ZQ1@altmetric.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
Hello Kyotaro,
Understood, that makes sense re: invalidation and what I assumed might
be happening.
I think I'm happy to leave the method of resolution up to you, I think
the main point for me would be that when a
replication process gets terminated as a consequence of the underlying
temporary replication slot reaching max_slot_wal_keep_size
that we log a specific message to indicate to the user the cause of the
termination rather than leave it ambiguous.
Thank you
King regards
Alex E
Senior Site Reliability Engineer
Altmetric
On Mon, Dec 13 2021 at 14:44:42 +0900, Kyotaro Horiguchi
<horikyota(dot)ntt(at)gmail(dot)com> wrote:
> At Fri, 10 Dec 2021 15:46:11 +0000, Alex Enachioaie
> <alex(at)altmetric(dot)com <mailto:alex(at)altmetric(dot)com>> wrote in
>> So, essentially the server side log emmitted on a temporary
>> replication breaching max_slot_wal_keep_size limit is only:
>>
>> 2021-12-03 16:21:54 UTC [29724-2647] LOG: terminating process 42601
>> to
>> release replication slot "pg_basebackup_42601"
>>
>> whereas for a persistent replication slot we get an additional line
>> that clearly states _why_ the replication process was terminated:
>>
>> 2021-12-03 00:57:16 UTC [29724-2645] LOG: terminating process 3899
>> to
>> release replication slot "backup"
>> 2021-12-03 00:57:16 UTC [29724-2646] LOG: invalidating slot "backup"
>> because its restart_lsn 47198/1E000000 exceeds
>> max_slot_wal_keep_size
>>
>> I'm not sure if this means that in the case of a temporary slot it
>> does not get invalidated at all (I've not looked at the code), or
>> it's
>> simply that we don't emit a log message when it does because the
>> slot
>> would be discarded anyway, but such a message would be very useful
>> for
>> diagnostic purposes imo.
>
> The "invalidating slot" message is emitted when the slot needs to be
> invalidated, that is, when the slot persists after the user process is
> terminated. Thus the message cannot be seen for temporary slots since
> they are removed at process termination and no longer exist after
> that.
>
> At Wed, 08 Dec 2021 11:23:35 +0000, PG Bug reporting form
> <noreply(at)postgresql(dot)org <mailto:noreply(at)postgresql(dot)org>> wrote in
>> The core issue here then in our opinion is that Postgres server
>> should log
>> an error when the max_slot_wal_keep_size limit is reached for
>> temporary
>> replication slots as well as for permanent ones as otherwise
>> users/administrators are presented only with non-descript connection
>> termination errors which do not point to the actual cause of the
>> problem.
>
> If you mean the "invalidating slot" message by "an error", that
> wouldn't happen since invalidation is actually doesn't happen. Or, we
> could change the message like this. Does this make sense for you?
>
>> LOG: terminating process 42601 to release temporary replication
>> slot "pg_basebackup_42601"
>> DETAIL: The slot will be dropped by the process termination.
>
>
>> LOG: terminating process 3899 to release persistent replication
>> slot "backup"
> ...
>> LOG: invalidating slot "backup" because its restart_lsn
>> 47198/1E000000 exceeds max_slot_wal_keep_size
>
> regards.
>
> --
> Kyotaro Horiguchi
> NTT Open Source Software Center
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2021-12-13 11:10:57 | Re: BUG #17326: Postgres crashed when pg_reload_conf() with ssl certificate parameters |
Previous Message | Kyotaro Horiguchi | 2021-12-13 07:33:05 | Re: BUG #17334: Assert failed inside computeDistance() on gist index scanning |