Quick Links

Re: [HACKERS] parallel.c oblivion of worker-startup failures

From:	Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [HACKERS] parallel.c oblivion of worker-startup failures
Date:	2018-01-24 09:57:27
Message-ID:	CAEepm=1M-MSBA8rXxHgiK2Lgm3HOOuk987HSPmEnbWBtWNmBQQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Wed, Jan 24, 2018 at 5:25 PM, Thomas Munro
<thomas(dot)munro(at)enterprisedb(dot)com> wrote:
> If there were some way for the postmaster to cause reason
> PROCSIG_PARALLEL_MESSAGE to be set in the leader process instead of
> just notification via kill(SIGUSR1) when it fails to fork a parallel
> worker, we'd get (1) for free in any latch/CFI loop code. But I
> understand that we can't do that by project edict.

Based on the above observation, here is a terrible idea you'll all
hate. It is pessimistic and expensive: it thinks that every latch
wake might be the postmaster telling us it's failed to fork() a
parallel worker, until we've seen a sign of life on every worker's
error queue. Untested illustration code only. This is the only way
I've come up with to discover fork failure in any latch/CFI loop (ie
without requiring client code to explicitly try to read either error
or tuple queues).

--
Thomas Munro
http://www.enterprisedb.com

Attachment	Content-Type	Size
fork-failure-detection-idea.patch	application/octet-stream	3.2 KB

In response to

Re: [HACKERS] parallel.c oblivion of worker-startup failures at 2018-01-24 04:25:30 from Thomas Munro

Responses

Re: [HACKERS] parallel.c oblivion of worker-startup failures at 2018-01-24 20:05:01 from Peter Geoghegan

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Michael Paquier	2018-01-24 10:10:40	Re: [HACKERS] Refactoring identifier checks to consistently use strcmp
Previous Message	Ryan Murphy	2018-01-24 09:41:14	Is it valid to have logical replication between 2 databases on the same postgres server?