Re: Introduce XID age and inactive timeout based replication slot invalidation

From: Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>
To: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
Cc: vignesh C <vignesh21(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Peter Smith <smithpb2250(at)gmail(dot)com>
Subject: Re: Introduce XID age and inactive timeout based replication slot invalidation
Date: 2024-11-29 12:36:56
Message-ID: CABdArM5ENt1fFxGjx6-NQ863ia1U9poqC7XCMcc178syr7sdYg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Nov 28, 2024 at 1:29 PM Hayato Kuroda (Fujitsu)
<kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
>
> Dear Nisha,
>
> >
> > Attached v51 patch-set addressing all comments in [1] and [2].
> >
>
> Thanks for working on the feature! I've stated to review the patch.
> Here are my comments - sorry if there are something which have already been discussed.
> The thread is too long to follow correctly.
>
> Comments for 0001
> =============
>
> 01. binary_upgrade_logical_slot_has_caught_up
>
> ISTM that error_if_invalid is set to true when the slot can be moved forward, otherwise
> it is set to false. Regarding the binary_upgrade_logical_slot_has_caught_up, however,
> only valid slots will be passed to the funciton (see pg_upgrade/info.c) so I feel
> it is OK to set to true. Thought?
>

Right, corrected the call with error_if_invalid as true.

> Comments for 0002
> =============
>
> 03. check_replication_slot_inactive_timeout
>
> Can we overwrite replication_slot_inactive_timeout to zero when pg_uprade (and also
> pg_createsubscriber?) starts a server process? Several parameters have already been
> specified via -c option at that time. This can avoid an error while the upgrading.
> Note that this part is still needed even if you accept the comment. Users can
> manually boot with upgrade mode.
>

Done.

> 06. found bug
>
> While testing the patch, I found that slots can be invalidated too early when when
> the GUC is quite large. I think because an overflow is caused in InvalidatePossiblyObsoleteSlot().
>
> - Reproducer
>
> I set the replication_slot_inactive_timeout to INT_MAX and executed below commands,
> and found that the slot is invalidated.
>
> ```
> postgres=# SHOW replication_slot_inactive_timeout;
> replication_slot_inactive_timeout
> -----------------------------------
> 2147483647s
> (1 row)
> postgres=# SELECT * FROM pg_create_logical_replication_slot('test', 'test_decoding');
> slot_name | lsn
> -----------+-----------
> test | 0/18B7F38
> (1 row)
> postgres=# CHECKPOINT ;
> CHECKPOINT
> postgres=# SELECT slot_name, inactive_since, invalidation_reason FROM pg_replication_slots ;
> slot_name | inactive_since | invalidation_reason
> -----------+-------------------------------+---------------------
> test | 2024-11-28 07:50:25.927594+00 | inactive_timeout
> (1 row)
> ```
>
> - analysis
>
> In InvalidatePossiblyObsoleteSlot(), replication_slot_inactive_timeout_sec * 1000
> is passed to the third argument of TimestampDifferenceExceeds(), which is also the
> integer datatype. This causes an overflow and parameter is handled as the small
> value.
>
> - solution
>
> I think there are two possible solutions. You can choose one of them:
>
> a. Make the maximum INT_MAX/1000, or
> b. Change the unit to millisecond.
>

Fixed. It is reasonable to align with other timeout parameters by
using milliseconds as the unit.

--
Thanks,
Nisha

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Nisha Moond 2024-11-29 12:37:32 Re: Introduce XID age and inactive timeout based replication slot invalidation
Previous Message Nisha Moond 2024-11-29 12:36:33 Re: Introduce XID age and inactive timeout based replication slot invalidation