Re: intermittent failures in Cygwin from select_parallel tests

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: intermittent failures in Cygwin from select_parallel tests
Date: 2017-06-15 14:05:56
Message-ID: 30867.1497535556@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Wed, Jun 14, 2017 at 6:01 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> The lack of any other message before the 'could not map' failure must,
>> then, mean that dsm_attach() couldn't find an entry in shared memory
>> that it wanted to attach to. But how could that happen?

> Well, as Amit points out, there are entirely legitimate ways for that
> to happen. If the leader finishes the whole query itself before the
> worker reaches the dsm_attach() call, it will call dsm_detach(),
> destroying the segment, and the worker will hit this ERROR. That
> shouldn't happen very often in the real world, because we ought not to
> select a parallel plan in the first place unless the query is going to
> take a while to run, but the select_parallel test quite deliberately
> disarms all of the guards that would tend to discourage such plans.

But we know, from the subsequent failed assertion, that the leader was
still trying to launch parallel workers. So that particular theory
doesn't hold water.

> Of course, as Amit also points out, it could also be the result of
> some bug, but I'm not sure we have any reason to think so.

The fact that we've only seen this on cygwin leads the mind in the
direction of platform-specific problems. Both this case and lorikeet's
earlier symptoms could be explained if the parameters passed from leader
to workers somehow got corrupted occasionally; so that's what I've been
thinking about, but I'm not seeing anything.

Would someone confirm my recollection that the cygwin build does *not*
use EXEC_BACKEND, but relies on a cygwin-provided emulation of fork()?
How does that emulation work, anyway?

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2017-06-15 14:12:35 Re: intermittent failures in Cygwin from select_parallel tests
Previous Message Robert Haas 2017-06-15 13:57:04 Re: intermittent failures in Cygwin from select_parallel tests