From: | Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com> |
---|---|
To: | Nisha Moond <nisha(dot)moond412(at)gmail(dot)com> |
Cc: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Introduce XID age and inactive timeout based replication slot invalidation |
Date: | 2025-02-03 13:05:43 |
Message-ID: | CANhcyEWgb8WquGM34xHCknAQxVVp0yYBCsOtEgo5r0Y9aimFAw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, 31 Jan 2025 at 17:50, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com> wrote:
>
> On Fri, Jan 31, 2025 at 2:32 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Fri, Jan 31, 2025 at 10:40 AM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> > >
> > > ======
> > > src/backend/replication/slot.c
> > >
> > > ReportSlotInvalidation:
> > >
> > > 1.
> > > +
> > > + case RS_INVAL_IDLE_TIMEOUT:
> > > + Assert(inactive_since > 0);
> > > + /* translator: second %s is a GUC variable name */
> > > + appendStringInfo(&err_detail,
> > > + _("The slot has remained idle since %s, which is longer than the
> > > configured \"%s\" duration."),
> > > + timestamptz_to_str(inactive_since),
> > > + "idle_replication_slot_timeout");
> > > + break;
> > > +
> > >
> > > errdetail:
> > >
> > > I guess it is no fault of this patch because I see you've only copied
> > > nearby code, but AFAICT this function is still having an each-way bet
> > > by using a mixture of _() macro which is for strings intended be
> > > translated, but then only using them in errdetail_internal() which is
> > > for strings that are NOT intended to be translated. Isn't it
> > > contradictory? Why don't we use errdetail() here?
> > >
> >
> > Your question is valid and I don't have an answer. I encourage you to
> > start a new thread to clarify this.
> >
> > > errhint:
> > >
> > > Also, the way the 'hint' is implemented can only be meaningful for
> > > RS_INVAL_WAL_REMOVED. This is also existing code that IMO it was
> > > always strange, but now that this patch has added another kind of
> > > switch (cause) this hint implementation now looks increasingly hacky
> > > to me; it is also inflexible -- e.g. if you ever wanted to add
> > > different hints. A neater implementation would be to make the code
> > > more like how the err_detail is handled, so then the errhint string
> > > would only be assigned within the "case RS_INVAL_WAL_REMOVED:"
> > >
> >
> > This makes sense to me.
> >
> > +
> > + case RS_INVAL_IDLE_TIMEOUT:
> > + Assert(inactive_since > 0);
> > + /* translator: second %s is a GUC variable name */
> > + appendStringInfo(&err_detail,
> > + _("The slot has remained idle since %s, which is longer than the
> > configured \"%s\" duration."),
> > + timestamptz_to_str(inactive_since),
> > + "idle_replication_slot_timeout");
> >
> > I think the above message should be constructed on a model similar to
> > the following nearby message:"The slot's restart_lsn %X/%X exceeds the
> > limit by %llu bytes.". So, how about the following: "The slot's idle
> > time %s exceeds the configured \"%s\" duration"?
> >
> > Also, similar to max_slot_wal_keep_size, we should give a hint in this
> > case to increase idle_replication_slot_timeout.
> >
> > It is not clear why the injection point test is doing
> > pg_sync_replication_slots() etc. in the patch. The test should be
> > simple such that after creating a new physical or logical slot, enable
> > the injection point, then run the manual checkpoint command, and check
> > the invalidation status of the slot.
> >
>
> Thanks for the review! I have incorporated the above comments. The
> test in patch-002 has been optimized as suggested and now completes in
> less than a second.
> Please find the attached v66 patch set. The base patch(v65-001) is
> committed now, so I have rebased the patches.
>
> Thank you, Kuroda-san, for working on patch-002.
>
Hi Nisha,
I reviewed the v66 patch. I have few comments:
1. I also feel the default value should be set to '0' as suggested by
Vignesh in 1st point of [1].
2. Should we allow copying of invalidated slots?
Currently we are able to copy slots which are invalidated:
postgres=# select slot_name, active, restart_lsn, wal_status,
inactive_since , invalidation_reason from pg_replication_slots;
slot_name | active | restart_lsn | wal_status |
inactive_since | invalidation_reason
-----------+--------+-------------+------------+----------------------------------+---------------------
test1 | f | 0/16FDDE0 | lost | 2025-02-03
18:28:01.802463+05:30 | idle_timeout
(1 row)
postgres=# select pg_copy_logical_replication_slot('test1', 'test2');
pg_copy_logical_replication_slot
----------------------------------
(test2,0/16FDE18)
(1 row)
postgres=# select slot_name, active, restart_lsn, wal_status,
inactive_since , invalidation_reason from pg_replication_slots;
slot_name | active | restart_lsn | wal_status |
inactive_since | invalidation_reason
-----------+--------+-------------+------------+----------------------------------+---------------------
test1 | f | 0/16FDDE0 | lost | 2025-02-03
18:28:01.802463+05:30 | idle_timeout
test2 | f | 0/16FDDE0 | reserved | 2025-02-03
18:29:53.478023+05:30 |
(2 rows)
3. We have similar behaviour as above for physical slots.
Thanks and Regards,
Shlok Kyal
From | Date | Subject | |
---|---|---|---|
Next Message | Antonin Houska | 2025-02-03 13:07:55 | Re: why there is not VACUUM FULL CONCURRENTLY? |
Previous Message | Aleksander Alekseev | 2025-02-03 12:55:37 | Re: new commitfest transition guidance |