| From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> | 
|---|---|
| To: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> | 
| Cc: | Michael Paquier <michael(at)paquier(dot)xyz>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> | 
| Subject: | Re: Race conditions with checkpointer and shutdown | 
| Date: | 2019-04-19 04:02:48 | 
| Message-ID: | 7164.1555646568@sss.pgh.pa.us | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
>>> Maybe what we should be looking for is "why doesn't the walreceiver
>>> shut down"?  But the dragonet log you quote above shows the walreceiver
>>> exiting, or at least starting to exit.  Tis a puzzlement.
huh ... take a look at this little stanza in PostmasterStateMachine:
    if (pmState == PM_SHUTDOWN_2)
    {
        /*
         * PM_SHUTDOWN_2 state ends when there's no other children than
         * dead_end children left. There shouldn't be any regular backends
         * left by now anyway; what we're really waiting for is walsenders and
         * archiver.
         *
         * Walreceiver should normally be dead by now, but not when a fast
         * shutdown is performed during recovery.
         */
        if (PgArchPID == 0 && CountChildren(BACKEND_TYPE_ALL) == 0 &&
            WalReceiverPID == 0)
        {
            pmState = PM_WAIT_DEAD_END;
        }
    }
I'm too tired to think through exactly what that last comment might be
suggesting, but it sure seems like it might be relevant to our problem.
If the walreceiver *isn't* dead yet, what's going to ensure that we
can move forward later?
regards, tom lane
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Amit Langote | 2019-04-19 04:17:22 | Re: bug in update tuple routing with foreign partitions | 
| Previous Message | Amit Langote | 2019-04-19 04:00:24 | Re: bug in update tuple routing with foreign partitions |