From: | Alexander Kukushkin <cyberdemn(at)gmail(dot)com> |
---|---|
To: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: walsender vs. XLogBackgroundFlush during shutdown |
Date: | 2019-05-05 15:31:35 |
Message-ID: | CAFh8B=mOHwum5x3OhZ-P2KLuz3ObHyc97p2cdOxF5tZFusw5sg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, 2 May 2019 at 14:35, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> >From the client side perspective, it confirmed everything that it
> >should, but from the postgres side, this is not enough to shut down
> >cleanly. Maybe it is possible to change the check (sentPtr ==
> >replicatedPtr) to something like (lastMsgSentPtr <= replicatedPtr) or
> >it would be unsafe?
>
> I don't know.
>
> In general I think it's a bit strange that we're waiting for walsender
> processes to catch up even in fast shutdown mode, instead of just aborting
> them like other backends. But I assume there are reasons for that. OTOH it
> makes us vulnerable to issues like this, when a (presumably) misbehaving
> downstream prevents a shutdown.
IMHO waiting until remote side received and flushed all changes is a
right strategy, but physical and logical replication should be handled
slightly differently.
For a physical replication we want to make sure that remote side
received and flushed all changes, otherwise in case of switchover we
won't be able to join the former primary as a new standby.
Logical replication case is a bit different. I think we can safely
shutdown walsender when the client confirmed the last XLogData
message, while now we are waiting until the client confirms wal_end
received in the keepalive message. If we shutdown walsender too early,
and do a switchover, the client might miss some events, because
logical slots are not replicated :(
> >No, it didn't stuck there. During the shutdown postgres starts sending
> >a few thousand keepalive messages per second and receives back so many
> >feedback message, therefore the chances of interrupting somewhere in
> >the send are quite high.
>
> Uh, that seems a bit broken, perhaps?
Indeed, this is broken psycopg2 behavior :(
I am thinking about submitting a patch fixing it.
Actually I quickly skimmed through the pgjdbc logical replication
source code and example
https://jdbc.postgresql.org/documentation/head/replication.html and I
think that it will also cause problems with the shutdown.
Regards,
--
Alexander Kukushkin
From | Date | Subject | |
---|---|---|---|
Next Message | Raghav Jajodia | 2019-05-05 17:32:12 | Google Season of Docs 2019 - PostgreSQL |
Previous Message | Michael Paquier | 2019-05-05 14:16:45 | Re: Inconsistent error message wording for REINDEX CONCURRENTLY |