Re: Race conditions with checkpointer and shutdown

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Race conditions with checkpointer and shutdown
Date: 2019-04-19 04:02:48
Message-ID: 7164.1555646568@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>>> Maybe what we should be looking for is "why doesn't the walreceiver
>>> shut down"? But the dragonet log you quote above shows the walreceiver
>>> exiting, or at least starting to exit. Tis a puzzlement.

huh ... take a look at this little stanza in PostmasterStateMachine:

if (pmState == PM_SHUTDOWN_2)
{
/*
* PM_SHUTDOWN_2 state ends when there's no other children than
* dead_end children left. There shouldn't be any regular backends
* left by now anyway; what we're really waiting for is walsenders and
* archiver.
*
* Walreceiver should normally be dead by now, but not when a fast
* shutdown is performed during recovery.
*/
if (PgArchPID == 0 && CountChildren(BACKEND_TYPE_ALL) == 0 &&
WalReceiverPID == 0)
{
pmState = PM_WAIT_DEAD_END;
}
}

I'm too tired to think through exactly what that last comment might be
suggesting, but it sure seems like it might be relevant to our problem.
If the walreceiver *isn't* dead yet, what's going to ensure that we
can move forward later?

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2019-04-19 04:17:22 Re: bug in update tuple routing with foreign partitions
Previous Message Amit Langote 2019-04-19 04:00:24 Re: bug in update tuple routing with foreign partitions