Re: server process exited with code 1

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Eric Hill <Eric(dot)Hill(at)jmp(dot)com>
Cc: "pgsql-general(at)lists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: server process exited with code 1
Date: 2021-04-23 23:12:41
Message-ID: 1624113.1619219561@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Eric Hill <Eric(dot)Hill(at)jmp(dot)com> writes:
> I don’t believe we have any unusual extensions. We do have triggers, and the VM does have antivirus protection. I’ll work on exclusions for the AV, and we’ll look into our triggers a bit.

BTW, I happened to check in our commit log to see when we installed the
dead man switch I referred to, and the commit message is quite
interesting:

Author: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Branch: master Release: REL8_4_BR [969d7cd43] 2009-05-05 19:59:00 +0000

Install a "dead man switch" to allow the postmaster to detect cases where
a backend has done exit(0) or exit(1) without having disengaged itself
from shared memory. We are at risk for this whenever third-party code is
loaded into a backend, since such code might not know it's supposed to go
through proc_exit() instead. Also, it is reported that under Windows
there are ways to externally kill a process that cause the status code
returned to the postmaster to be indistinguishable from a voluntary exit
(thank you, Microsoft). If this does happen then the system is probably
hosed --- for instance, the dead session might still be holding locks.
So the best recovery method is to treat this like a backend crash.

The dead man switch is armed for a particular child process when it
acquires a regular PGPROC, and disarmed when the PGPROC is released;
these should be the first and last touches of shared memory resources
in a backend, or close enough anyway. This choice means there is no
coverage for auxiliary processes, but I doubt we need that, since they
shouldn't be executing any user-provided code anyway.

So it seems like you'd better be looking into the possibility that
something entirely external to Postgres is causing that backend process
to quit with what looks like _exit(1). Unfortunately, this commit is
from before we had a habit of including links to mailing list
discussions in commit messages; but if you dig around in the pghackers
archives near that date you can probably find the thread, and maybe
there will be more info about the Windows aspect. (I'm not a Windows
guy, so this was purely hearsay on my part.)

regards, tom lane

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Andres Freund 2021-04-23 23:42:56 Re: pg_upgrade can result in early wraparound on databases with high transaction load
Previous Message Eric Hill 2021-04-23 19:30:12 Re: server process exited with code 1