Quick Links

logical replication walsender loop preventing a clean shutdown

From:	Greg Sabino Mullane <htamfids(at)gmail(dot)com>
To:	pgsql-bugs(at)lists(dot)postgresql(dot)org, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject:	logical replication walsender loop preventing a clean shutdown
Date:	2024-09-16 18:27:42
Message-ID:	CAKAnmm+STYvW_5aRx2C0QWgbNpd_zEjruc6MytePnRuK8oKtTA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-bugs

When doing logical replication, a large transaction can prevent the
postgres process from shutting down until the WAL has all been processed
and the client reports back. This is obviously less than ideal, as it means
a pg_ctl stop -m fast can take minutes or hours to complete. I would expect
the behavior to be that all backends are signalled so they can leave
cleanly.

I found this thread that reports something very similar (but without the
infinite looping):

Subject: walsender bug: stuck during shutdown
https://www.postgresql.org/message-id/flat/20201123205253.GA10075%40alvherre.pgsql

I have cc'd Alvaro in case he has any progress on this, or ideas. I tried
applying the patch from that thread, but the behavior remained unchanged.
Wanted to raise this in -bugs for added visibility, and also see if anyone
had thoughts before I dig deeper.

My test case (tested with latest, as of commit
b8ea0f675f35c3f0c2cf62175517ba0dacad4abd)

* Spin up a cluster, port 5555, using wal_level logical
* pg_recvlogical --create-slot -d postgres -p 5555 --slot=foo
* pg_recvlogical --start -d postgres -p 5555 --slot=foo --file /tmp/tmp
* If all is well, ctrl-z, bg 1, watch -n 3 tail /tmp/tmp

Other session:
* psql -p 5555 postgres
* create table t (id int generated always as identity, foo text);
* insert into t(foo) select 'abcdefghijklmnopqrstuvwxyz' from
generate_series(1,10_000_000);

Once the commit finishes, and as soon as pc_recvlogical starts processing
it:

* time pg_ctl stop -m fast -w -t 10000

I found 10 million a nice test on my system - shutdown takes an additional
50 seconds or so, as it waits for pg_recvlogical to respond.

Cheers,
Greg

Responses

Re: logical replication walsender loop preventing a clean shutdown at 2024-09-16 22:56:45 from Masahiko Sawada

Browse pgsql-bugs by date

	From	Date	Subject
Next Message	PG Bug reporting form	2024-09-16 18:54:50	BUG #18620: Problem: Slow Delete Operation
Previous Message	Tom Lane	2024-09-16 17:28:30	Re: BUG #18618: pg_upgrade from 14 to 15+ fails for unlogged table with identity column