Quick Links

Re: walsender vs. XLogBackgroundFlush during shutdown

From:	Alexander Kukushkin <cyberdemn(at)gmail(dot)com>
To:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc:	pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject:	Re: walsender vs. XLogBackgroundFlush during shutdown
Date:	2019-05-05 15:31:35
Message-ID:	CAFh8B=mOHwum5x3OhZ-P2KLuz3ObHyc97p2cdOxF5tZFusw5sg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Thu, 2 May 2019 at 14:35, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> >From the client side perspective, it confirmed everything that it
> >should, but from the postgres side, this is not enough to shut down
> >cleanly. Maybe it is possible to change the check (sentPtr ==
> >replicatedPtr) to something like (lastMsgSentPtr <= replicatedPtr) or
> >it would be unsafe?
>
> I don't know.
>
> In general I think it's a bit strange that we're waiting for walsender
> processes to catch up even in fast shutdown mode, instead of just aborting
> them like other backends. But I assume there are reasons for that. OTOH it
> makes us vulnerable to issues like this, when a (presumably) misbehaving
> downstream prevents a shutdown.

IMHO waiting until remote side received and flushed all changes is a
right strategy, but physical and logical replication should be handled
slightly differently.
For a physical replication we want to make sure that remote side
received and flushed all changes, otherwise in case of switchover we
won't be able to join the former primary as a new standby.
Logical replication case is a bit different. I think we can safely
shutdown walsender when the client confirmed the last XLogData
message, while now we are waiting until the client confirms wal_end
received in the keepalive message. If we shutdown walsender too early,
and do a switchover, the client might miss some events, because
logical slots are not replicated :(

> >No, it didn't stuck there. During the shutdown postgres starts sending
> >a few thousand keepalive messages per second and receives back so many
> >feedback message, therefore the chances of interrupting somewhere in
> >the send are quite high.
>
> Uh, that seems a bit broken, perhaps?

Indeed, this is broken psycopg2 behavior :(
I am thinking about submitting a patch fixing it.

Actually I quickly skimmed through the pgjdbc logical replication
source code and example
https://jdbc.postgresql.org/documentation/head/replication.html and I
think that it will also cause problems with the shutdown.

Regards,
--
Alexander Kukushkin

In response to

Re: walsender vs. XLogBackgroundFlush during shutdown at 2019-05-02 12:35:45 from Tomas Vondra

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Raghav Jajodia	2019-05-05 17:32:12	Google Season of Docs 2019 - PostgreSQL
Previous Message	Michael Paquier	2019-05-05 14:16:45	Re: Inconsistent error message wording for REINDEX CONCURRENTLY