Re: Reviving lost replication slots

From: sirisha chamarthi <sirichamarthi22(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Reviving lost replication slots
Date: 2022-11-09 03:26:45
Message-ID: CAKrAKeXaSAW4wgGrZgaons4Z8sBTCy_FCKhvgiB000=FO=gbfw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Nov 8, 2022 at 1:36 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:

> On Tue, Nov 8, 2022 at 12:08 PM sirisha chamarthi
> <sirichamarthi22(at)gmail(dot)com> wrote:
> >
> > On Fri, Nov 4, 2022 at 11:02 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> wrote:
> >>
> >> On Fri, Nov 4, 2022 at 1:40 PM sirisha chamarthi
> >> <sirichamarthi22(at)gmail(dot)com> wrote:
> >> >
> >> > A replication slot can be lost when a subscriber is not able to catch
> up with the load on the primary and the WAL to catch up exceeds
> max_slot_wal_keep_size. When this happens, target has to be reseeded
> (pg_dump) from the scratch and this can take longer. I am investigating the
> options to revive a lost slot.
> >> >
> >>
> >> Why in the first place one has to set max_slot_wal_keep_size if they
> >> care for WAL more than that?
> >
> > Disk full is a typical use where we can't wait until the logical slots
> to catch up before truncating the log.
> >
>
> Ideally, in such a case the subscriber should fall back to the
> physical standby of the publisher but unfortunately, we don't yet have
> a functionality where subscribers can continue logical replication
> from physical standby. Do you think if we had such functionality it
> would serve our purpose?
>

Don't think streaming from standby helps as the disk layout is expected
to remain the same on physical standby and primary.

> >> If you have a case where you want to
> >> handle this case for some particular slot (where you are okay with the
> >> invalidation of other slots exceeding max_slot_wal_keep_size) then the
> >> other possibility could be to have a similar variable at the slot
> >> level but not sure if that is a good idea because you haven't
> >> presented any such case.
> >
> > IIUC, ability to fetch WAL from the archive as a fall back mechanism
> should automatically take care of all the lost slots. Do you see a need to
> take care of a specific slot?
> >
>
> No, I was just trying to see if your use case can be addressed in some
> other way. BTW, won't copying the WAL again back from archive can lead
> to a disk full situation.
>
The idea is to download the WAL from archive on demand as the slot requires
them and throw away the segment once processed.

>
> --
> With Regards,
> Amit Kapila.
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message sirisha chamarthi 2022-11-09 03:39:58 Re: Reviving lost replication slots
Previous Message Andres Freund 2022-11-09 03:21:43 Re: Locks release order in LogStandbySnapshot