Triaging pg_ctl shutdown hang

From: Joseph Hammerman <joe(dot)hammerman(at)datadoghq(dot)com>
To: pgsql-admin(at)lists(dot)postgresql(dot)org
Subject: Triaging pg_ctl shutdown hang
Date: 2021-12-29 15:16:44
Message-ID: CAHs7QM_yx=KhjwHub7PyqvaosTpb9AQxXzHPFx7Pnu+0hvxLaw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Hi pgsql-admins list,

We recently had an incident precipitated by postgres 9.6.22 shutdown -m
fast hanging. There were two processes that were not quitting, the
postmaster and the logger process. We had limited visibility into the
underlying conditions since psql locks out new connections and kicks
everyone out in fast shut down mode. Even when we escalated the shutdown
signal to immediate, the processes were not exiting.

I’m trying to put together a checklist for data for us to capture to
determine the root cause of the hang if we encounter this issue again.
For example, running echo w > /proc/sysrq-trigger to get a list of
processes in uninterruptible sleep, and perform a kernel stack trace on
them. Is it worth stracing the postmaster process and surviving children?
Does pg_controldata surface any useful data?

As a follow up question, is there a way to obtain an administrative
backdoor or leave one open during hanging fast shutdown operations?

Thanks in advance for any clarity or guidance anyone the message board can
provide.

Joe Hammerman

Browse pgsql-admin by date

  From Date Subject
Next Message Magnus Rolf 2021-12-29 16:00:24 Re: PostgreSQL Replication between Different Major Version (11-13)
Previous Message Magnus Hagander 2021-12-29 13:27:36 Re: PostgreSQL Replication between Different Major Version (11-13)