Quick Links

Re: Race conditions with checkpointer and shutdown

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc:	Michael Paquier <michael(at)paquier(dot)xyz>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Race conditions with checkpointer and shutdown
Date:	2019-04-19 04:02:48
Message-ID:	7164.1555646568@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

>>> Maybe what we should be looking for is "why doesn't the walreceiver
>>> shut down"? But the dragonet log you quote above shows the walreceiver
>>> exiting, or at least starting to exit. Tis a puzzlement.

huh ... take a look at this little stanza in PostmasterStateMachine:

if (pmState == PM_SHUTDOWN_2)
{
/*
* PM_SHUTDOWN_2 state ends when there's no other children than
* dead_end children left. There shouldn't be any regular backends
* left by now anyway; what we're really waiting for is walsenders and
* archiver.
*
* Walreceiver should normally be dead by now, but not when a fast
* shutdown is performed during recovery.
*/
if (PgArchPID == 0 && CountChildren(BACKEND_TYPE_ALL) == 0 &&
WalReceiverPID == 0)
{
pmState = PM_WAIT_DEAD_END;
}
}

I'm too tired to think through exactly what that last comment might be
suggesting, but it sure seems like it might be relevant to our problem.
If the walreceiver *isn't* dead yet, what's going to ensure that we
can move forward later?

regards, tom lane

In response to

Re: Race conditions with checkpointer and shutdown at 2019-04-19 03:48:07 from Tom Lane

Responses

Re: Race conditions with checkpointer and shutdown at 2019-04-28 00:56:51 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Amit Langote	2019-04-19 04:17:22	Re: bug in update tuple routing with foreign partitions
Previous Message	Amit Langote	2019-04-19 04:00:24	Re: bug in update tuple routing with foreign partitions