Re: Taking into account syncrep position in flush_lsn reported by apply worker

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Arseny Sher <ars(at)neon(dot)tech>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Taking into account syncrep position in flush_lsn reported by apply worker
Date: 2024-08-21 08:59:43
Message-ID: CAA4eK1KzswP=27k0bNBysh9PhQP2MhbYtoU_Q6kD=fRK-ZoHzA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Aug 21, 2024 at 12:40 PM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>
> On 21/08/2024 09:25, Amit Kapila wrote:
> >>
> >> I think this patch makes sense. I'm not sure we've actually made any
> >> promises on it, but it feels wrong that the slot's LSN might be advanced
> >> past the LSN that's been has been acknowledged by the replica, if
> >> synchronous replication is configured. I see little downside in making
> >> that promise.
> >
> > One possible downside of such a promise could be that the publisher
> > may slow down for sync replication because it has to wait for all the
> > configured sync_standbys of subscribers to acknowledge the LSN. I
> > don't know how many applications can be impacted due to this if we do
> > it by default but if we feel there won't be any such cases or they
> > will be in the minority then it is okay to proceed with this.
>
> It only slows down updating the flush LSN on the publisher, which is
> updated quite lazily anyway.
>

But doesn't that also mean that if the logical subscriber is
configured in synchronous_standby_names, then the publisher's
transactions also need to wait for such an update? We do update it
lazily but as soon as the operation is applied to the subscriber the
transaction on publisher will be released, however, IIUC the same
won't be true after the patch.

> A more serious scenario is if the sync replica crashes or is not
> responding at all. In that case, the flush LSN on the publisher cannot
> advance, and WAL starts to accumulate. However, if a sync replica is not
> responding, that's very painful for the (subscribing) server anyway: all
> commits will hang waiting for the replica. Holding back the flush LSN on
> the publisher seems like a minor problem compared to that.
>

Yeah, but as per my understanding that also means holding all the
active work/transactions on the publisher which doesn't sound to be a
minor problem.

> It would be good to have some kind of an escape hatch though. If you get
> into that situation, is there a way to advance the publisher's flush LSN
> even though the synchronous replica has crashed? You can temporarily
> turn off synchronous replication on the subscriber. That will release
> any COMMITs on the server too. In theory you might not want that, but in
> practice stuck COMMITs are so painful that if you are taking manual
> action, you probably do want to release them as well.
>

This will work in the scenario you mentioned.

If the above understanding is correct and you agree that it is not a
good idea to hold back transactions on the publisher then we can think
of a new subscription that allows the apply worker to wait for
synchronous replicas.

--
With Regards,
Amit Kapila.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2024-08-21 09:04:57 Re: Cirrus CI for macOS branches 16 and 15 broken
Previous Message Peter Eisentraut 2024-08-21 08:57:48 Re: Disallow USING clause when altering type of generated column