Re: Disallow quorum uncommitted (with synchronous standbys) txns in logical replication subscribers

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: SATYANARAYANA NARLAPURAM <satyanarlapuram(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Disallow quorum uncommitted (with synchronous standbys) txns in logical replication subscribers
Date: 2022-01-07 18:43:08
Message-ID: CALj2ACVikJfpNHStSaYaSxwopEVg=y36dYA5Jq8rjoC+9cX+wQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jan 7, 2022 at 12:54 PM Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
>
> On Wed, 2022-01-05 at 23:59 -0800, SATYANARAYANA NARLAPURAM wrote:
> > I would like to propose a GUC send_Wal_after_quorum_committed which
> > when set to ON, walsenders corresponds to async standbys and logical
> > replication workers wait until the LSN is quorum committed on the
> > primary before sending it to the standby. This not only simplifies
> > the post failover steps but avoids unnecessary downtime for the async
> > replicas. Thoughts?
>
> Do we need a GUC? Or should we just always require that sync rep is
> satisfied before sending to async replicas?
>
> It feels like the sync quorum should always be ahead of the async
> replicas. Unless I'm missing a use case, or there is some kind of
> performance gotcha.

IMO, having GUC is a reasonable choice because some users might be
okay with it if their async replicas are ahead of the sync ones or
they would have dealt with this problem already in their HA solutions
or they don't want their async replicas to fall behind by the primary
(most of the times).

If there are long running txns on the primary and the async standbys
were to wait until quorum commit from sync standbys, won't they fall
behind the primary by too much? This isn't a problem at all if we
think from the perspective that async replicas are anyways prone to
falling behind by the primary. But, if the primary is having long
running txns continuously, the async replicas would eventually fall
behind more and more. Is there a way we can send the WAL records to
both sync and async replicas together but the async replicas won't
apply those WAL records until primary tells the standbys that quorum
commit is obtained? If the quorum commit isn't obtained by the
primary, the async replicas can ignore to apply the WAL records and
discard them.

Regards,
Bharath Rupireddy.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2022-01-07 18:46:00 Re: libpq compression (part 2)
Previous Message Bossart, Nathan 2022-01-07 18:23:24 Re: Disallow quorum uncommitted (with synchronous standbys) txns in logical replication subscribers