RFC: Giving bgworkers walsender-like grace during shutdown (for logical replication)

From: Craig Ringer <craig(dot)ringer(at)enterprisedb(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Bruce Momjian <bruce(at)momjian(dot)us>, Petr Jelinek <petr(dot)jelinek(at)enterprisedb(dot)com>, Markus Wanner <markus(dot)wanner(at)enterprisedb(dot)com>, Alexey Kondratov <a(dot)kondratov(at)postgrespro(dot)ru>
Subject: RFC: Giving bgworkers walsender-like grace during shutdown (for logical replication)
Date: 2020-12-07 03:33:57
Message-ID: CAGRY4nzO7-0Q6UkzOHKRTrLLYVNbq8hnGaO-o9eSBmNhY-Jo=g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi folks

TL;DR: Anyone object to a new bgworker flag that exempts background workers
(such as logical apply workers) from the first round of fast shutdown
signals, and instead lets them to finish their currently in-progress txn
and exit?

This is a change proposal raised for comment before patch submission so
please consider it. Explanation of why I think we need it comes first, then
proposed implementation.

Rationale:

Currently a fast shutdown causes logical replication subscribers to abort
their currently in-progress transaction and terminate along with user
backends. This means they cannot finish receiving and flushing the
currently in-progress transaction, possibly wasting a very large amount of
work.

After restart the subscriber must reconnect, decode and reorder buffer from
the restart_lsn up to the current confirmed_flush_lsn, receive the whole
txn on the wire all over again, and apply the whole txn again locally. We
don't currently spool received txn change-streams to disk on the subscriber
and flush them so we can't repeat just the local apply part (see the
related thread "Logical archiving" for relevant discussion there). This can
create a lot of bloat, a lot of excess WAL, etc, if a big txn was in
progress at the time.

I'd like to add a bgworker flag that tells the postmaster to treat the
logical apply bgworker (or extension equivalents) somewhat like a walsender
for the purpose of fast shutdown. Instead of immediately terminating it
like user backends on fast shutdown, the bgworker should be sent a
ProcSignal warning that shutdown is pending and instructing it to finish
receiving and applying its current transaction, then exit gracefully.

It's not quite the same as the walsender, since there we try to flush
changes to downstreams up to the end of the last commit before shutting
down. That doesn't make sense on a subscriber because the upstream is
likely still generating txns. We just want to avoid wasting our effort on
any in-flight txn.

Any objections?

Proposed implementation:

* Add new bgworker flag like BGW_DELAYED_SHUTDOWN

* Define new ProcSignal PROCSIG_SHUTDOWN_REQUESTED. On fast shutdown send
this instead of a SIGTERM to bgworker backends flagged
BGW_DELAYED_SHUTDOWN. On smart shutdown send it to all backends when the
shutdown request arrives, since that could be handy for other uses too.

* Flagged bgworker is expected to finish its current txn and exit promptly.
Impose a grace period after which they get SIGTERM'd anyway. Also send a
SIGTERM if the postmaster receives a second fast shutdown request.

* Defer sending PROCSIG_WALSND_INIT_STOPPING to walsenders until all
BGW_DELAYED_SHUTDOWN flagged bgworkers have exited, so we can ensure that
cascaded downstreams receive any txns applied from the upstream.

This doesn't look likely to be particularly complicated to implement.

It might be better to use a flag in PGPROC rather than the bgworker struct,
in case we want to extend this to other backend types in future. Also to
make it easier for the postmaster to check the flag during shutdown. Could
just claim a bit from statusFlags for the purpose. Thoughts?

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2020-12-07 03:44:20 Re: Single transaction in the tablesync worker?
Previous Message Zhihong Yu 2020-12-07 03:27:18 Re: Parallel Inserts in CREATE TABLE AS