From: | Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> |
---|---|
To: | Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Cc: | Chloe Dives <Chloe(dot)Dives(at)cantabcapital(dot)com>, Chris Wilson <chris(dot)wilson(at)cantabcapital(dot)com> |
Subject: | Re: walsender bug: stuck during shutdown |
Date: | 2020-11-24 07:35:14 |
Message-ID: | 94910fe9-a720-7f49-c678-d9a16d42e6fb@oss.nttdata.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2020/11/24 5:52, Alvaro Herrera wrote:
> Hello
>
> Chloe Dives reported that sometimes a walsender would become stuck
> during shutdown and *not* shutdown, thus preventing postmaster from
> completing the shutdown cycle. This has been observed to cause the
> servers to remain in such state for several hours.
>
> After a lengthy investigation and thanks to a handy reproducer by Chris
> Wilson, we found that the problem is that WalSndDone wants to avoid
> shutting down until everything has been sent and acknowledged; but this
> test is coded in a way that ignores the possibility that we have never
> received anything from the other end. In that case, both
> MyWalSnd->flush and MyWalSnd->write are InvalidRecPtr, so the condition
> in WalSndDone to terminate the loop is never fulfilled. So the
> walsender is looping forever and never terminates, blocking shutdown of
> the whole instance.
>
> The attached patch fixes the problem by testing for the problematic
> condition.
>
> Apparently this problem has existed forever. Fujii-san almost patched
> for it in 5c6d9fc4b2b8 (2014!), but missed it by a zillionth of an inch.
Thanks for working on this!
Could you tell me the discussion thread where Chloe Dives reported the issue to?
Sorry I could not find that..
I'd like to see the procedure to reproduce the issue.
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
From | Date | Subject | |
---|---|---|---|
Next Message | Tatsuro Yamada | 2020-11-24 07:46:28 | Huge memory consumption on partitioned table with FKs |
Previous Message | tsunakawa.takay@fujitsu.com | 2020-11-24 06:34:09 | RE: [PoC] Non-volatile WAL buffer |