Re: Exit walsender before confirming remote flush in logical replication

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: amit(dot)kapila16(at)gmail(dot)com
Cc: sawada(dot)mshk(at)gmail(dot)com, kuroda(dot)hayato(at)fujitsu(dot)com, andres(at)anarazel(dot)de, michael(at)paquier(dot)xyz, peter(dot)eisentraut(at)enterprisedb(dot)com, dilipbalaut(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Exit walsender before confirming remote flush in logical replication
Date: 2023-02-02 04:34:23
Message-ID: 20230202.133423.283791550224495611.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Wed, 1 Feb 2023 14:58:14 +0530, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote in
> On Wed, Feb 1, 2023 at 2:09 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > Otherwise, we will end up terminating
> > the WAL stream without the done message. Which will lead to an error
> > message "ERROR: could not receive data from WAL stream: server closed
> > the connection unexpectedly" on the subscriber even at a clean
> > shutdown.
> >
>
> But will that be a problem? As per docs of shutdown [1] ( “Smart” mode
> disallows new connections, then waits for all existing clients to
> disconnect. If the server is in hot standby, recovery and streaming
> replication will be terminated once all clients have disconnected.),
> there is no such guarantee. I see that it is required for the
> switchover in physical replication to ensure that all the WAL is sent
> and replicated but we don't need that for logical replication.

+1

Since publisher is not aware of apply-delay (by this patch), as a
matter of fact publisher seems gone before sending EOS in that
case. The error message is correctly describing that situation.

> > In a case where pq_is_send_pending() doesn't become false
> > for a long time, (e.g., the network socket buffer got full due to the
> > apply worker waiting on a lock), I think users should unblock it by
> > themselves. Or it might be practically better to shutdown the
> > subscriber first in the logical replication case, unlike the physical
> > replication case.
> >
>
> Yeah, will users like such a dependency? And what will they gain by doing so?

If PostgreSQL required such kind of special care about shutdown at
facing a trouble to keep replication consistency, that won't be
acceptable. The current time-delayed logical replication can be seen
as a kind of intentional continuous large network lag in this
aspect. And I think the consistency is guaranteed even in such cases.

On the other hand I don't think the almost all people care about the
exact progress when facing such troubles, as far as replication
consistently is maintained. IMHO that is also true for the
logical-delayed-replication case.

> [1] - https://www.postgresql.org/docs/devel/app-pg-ctl.html

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2023-02-02 04:45:47 Re: Fix GUC_NO_SHOW_ALL test scenario in 003_check_guc.pl
Previous Message Michael Paquier 2023-02-02 04:24:15 Re: Weird failure with latches in curculio on v15