From: | "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com> |
---|---|
To: | 'Nisha Moond' <nisha(dot)moond412(at)gmail(dot)com> |
Cc: | vignesh C <vignesh21(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Peter Smith <smithpb2250(at)gmail(dot)com> |
Subject: | RE: Introduce XID age and inactive timeout based replication slot invalidation |
Date: | 2024-11-28 07:59:21 |
Message-ID: | TYAPR01MB56927564EEE26E5433198405F5292@TYAPR01MB5692.jpnprd01.prod.outlook.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Dear Nisha,
>
> Attached v51 patch-set addressing all comments in [1] and [2].
>
Thanks for working on the feature! I've stated to review the patch.
Here are my comments - sorry if there are something which have already been discussed.
The thread is too long to follow correctly.
Comments for 0001
=============
01. binary_upgrade_logical_slot_has_caught_up
ISTM that error_if_invalid is set to true when the slot can be moved forward, otherwise
it is set to false. Regarding the binary_upgrade_logical_slot_has_caught_up, however,
only valid slots will be passed to the funciton (see pg_upgrade/info.c) so I feel
it is OK to set to true. Thought?
02. ReplicationSlotAcquire
According to other functions, we are adding to a note to the translator when
parameters represent some common nouns, GUC names. I feel we should add a comment
for RS_INVAL_WAL_LEVEL part based on it.
Comments for 0002
=============
03. check_replication_slot_inactive_timeout
Can we overwrite replication_slot_inactive_timeout to zero when pg_uprade (and also
pg_createsubscriber?) starts a server process? Several parameters have already been
specified via -c option at that time. This can avoid an error while the upgrading.
Note that this part is still needed even if you accept the comment. Users can
manually boot with upgrade mode.
04. ReplicationSlotAcquire
Same comment as 02.
05. ReportSlotInvalidation
Same comment as 02.
06. found bug
While testing the patch, I found that slots can be invalidated too early when when
the GUC is quite large. I think because an overflow is caused in InvalidatePossiblyObsoleteSlot().
- Reproducer
I set the replication_slot_inactive_timeout to INT_MAX and executed below commands,
and found that the slot is invalidated.
```
postgres=# SHOW replication_slot_inactive_timeout;
replication_slot_inactive_timeout
-----------------------------------
2147483647s
(1 row)
postgres=# SELECT * FROM pg_create_logical_replication_slot('test', 'test_decoding');
slot_name | lsn
-----------+-----------
test | 0/18B7F38
(1 row)
postgres=# CHECKPOINT ;
CHECKPOINT
postgres=# SELECT slot_name, inactive_since, invalidation_reason FROM pg_replication_slots ;
slot_name | inactive_since | invalidation_reason
-----------+-------------------------------+---------------------
test | 2024-11-28 07:50:25.927594+00 | inactive_timeout
(1 row)
```
- analysis
In InvalidatePossiblyObsoleteSlot(), replication_slot_inactive_timeout_sec * 1000
is passed to the third argument of TimestampDifferenceExceeds(), which is also the
integer datatype. This causes an overflow and parameter is handled as the small
value.
- solution
I think there are two possible solutions. You can choose one of them:
a. Make the maximum INT_MAX/1000, or
b. Change the unit to millisecond.
Best regards,
Hayato Kuroda
FUJITSU LIMITED
From | Date | Subject | |
---|---|---|---|
Next Message | Andrey M. Borodin | 2024-11-28 08:09:43 | Re: UUID v7 |
Previous Message | Anthonin Bonnefoy | 2024-11-28 07:53:09 | Re: Consider pipeline implicit transaction as a transaction block |