From: | "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com> |
---|---|
To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com> |
Cc: | Álvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | RE: Introduce XID age and inactive timeout based replication slot invalidation |
Date: | 2025-02-12 07:46:22 |
Message-ID: | OS0PR01MB5716C131A7D80DAE8CB9E88794FC2@OS0PR01MB5716.jpnprd01.prod.outlook.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wednesday, February 12, 2025 11:56 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Tue, Feb 11, 2025 at 9:39 PM Nathan Bossart
> <nathandbossart(at)gmail(dot)com> wrote:
> >
> > On Tue, Feb 11, 2025 at 03:22:49PM +0100, Álvaro Herrera wrote:
> > > I find this proposed patch a bit strange and I feel it needs more
> > > explanation.
> > >
> > > When this thread started, Bharath justified his patches saying that
> > > a slot that's inactive for a very long time could be problematic
> > > because of XID wraparound. Fine, that sounds a reasonable feature.
> > > If you wanted to invalidate slots whose xmins were too old, I would
> > > support that. He submitted that as his 0004 patch then.
> > >
> > > However, he also chose to submit 0003 with invalidation based on a
> > > timeout. This is far less convincing a feature to me. The
> > > justification for the time out seems to be that ... it's difficult
> > > to have a one-size-fits-all value because size of disks vary. (???)
> > > Or something like that. Really? I mean -- yes, this will prevent
> > > problems in toy databases when run in developer's laptops. It will
> > > not prevent any problems in production databases. Do we really want
> > > a setting that is only useful for toy situations rather than production?
> > >
> > >
> ...
> > >
> > > I'm baffled.
> >
> > I agree, and I am also baffled because I think this discussion has
> > happened at least once already on this thread.
> >
>
> Yes, we previously discussed this topic and Robert seems to prefer a
> time-based parameter for invalidating the slot (1)(2) as it is easier to reason in
> terms of time. The other points discussed previously were that there are tools
> that create a lot of slots and sometimes forget to clean up slots. Bharath has
> seen this in production and we now have the tool pg_createsubscriber that
> creates a slot-per-database, so if for some reason, such slots are not cleaned
> on the tool's exit, such a parameter could save the cluster. See (3)(4).
>
> Also, we previously didn't have a good experience with XID-based threshold
> parameters like vacuum_defer_cleanup_age as mentioned by Robert (1).
> AFAICU from the previous discussion we need a time-based parameter and we
> didn't rule out xid_age based parameter as another parameter.
Yeah, I think the primary purpose of this time-based option is to invalidate dormant
replication slots that have been inactive for a long period, in which case the
slots are no longer useful.
Such slots can remain if a subscriber is down due to a system error or
inaccessible because of network issues. If this situation persists, it might be
more practical to recreate the subscriber rather than attempt to recover the
node and wait for it to catch up, which could be time-consuming.
Parameters like max_slot_wal_keep_size and max_slot_xid_id_age do not
differentiate between active and inactive replication slots. Some customers I
met are hesitant about using these settings, as they can sometimes invalidate
a slot unnecessarily and break the replication.
> (1) -
> https://www.postgresql.org/message-id/CA%2BTgmoZTbaaEjSZUG1FL0mzx
> AdN3qmXksO3O9_PZhEuXTkVnRQ%40mail.gmail.com
> (2) -
> https://www.postgresql.org/message-id/CA%2BTgmoaRECcnyqxAxUhP5dk2
> S4HX%3DpGh-p-PkA3uc%2BjG_9hiMw%40mail.gmail.com
> (3) -
> https://www.postgresql.org/message-id/CALj2ACVFV%3DyUa3DXXfJLOtJxU
> M8qzC_mEECMJ2iekDGPeQLkTw%40mail.gmail.com
> (4) -
> https://www.postgresql.org/message-id/CAA4eK1L3awyzWMuymLJUm8SoF
> EQe%3DDa9KUwCcAfC31RNJ1xdJA%40mail.gmail.com
Best Regards,
Hou zj
From | Date | Subject | |
---|---|---|---|
Next Message | Shubham Khanna | 2025-02-12 07:48:27 | Re: Enhance 'pg_createsubscriber' to retrieve databases automatically when no database is provided. |
Previous Message | Shubham Khanna | 2025-02-12 07:45:49 | Re: Enhance 'pg_createsubscriber' to retrieve databases automatically when no database is provided. |