Quick Links

Re: stress test for parallel workers

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc:	Justin Pryzby <pryzby(at)telsasoft(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: stress test for parallel workers
Date:	2019-07-24 05:15:14
Message-ID:	17389.1563945314@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Thomas Munro <thomas(dot)munro(at)gmail(dot)com> writes:
> On Wed, Jul 24, 2019 at 10:11 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> In any case, the evidence from the buildfarm is pretty clear that
>> there is *some* connection. We've seen a lot of recent failures
>> involving "postmaster exited during a parallel transaction", while
>> the number of postmaster failures not involving that is epsilon.

> I don't have access to the build farm history in searchable format
> (I'll go and ask for that).

Yeah, it's definitely handy to be able to do SQL searches in the
history. I forget whether Dunstan or Frost is the person to ask
for access, but there's no reason you shouldn't have it.

> Do you have an example to hand? Is this
> failure always happening on Linux?

I dug around a bit further, and while my recollection of a lot of
"postmaster exited during a parallel transaction" failures is accurate,
there is a very strong correlation I'd not noticed: it's just a few
buildfarm critters that are producing those. To wit, I find that
string in these recent failures (checked all runs in the past 3 months):

We already knew that lorikeet has its own peculiar stability
problems, and these other two critters run different compilers
on the same Fedora 27 ppc64le platform.

So I think I've got to take back the assertion that we've got
some lurking generic problem. This pattern looks way more
like a platform-specific issue. Overaggressive OOM killer
would fit the facts on vulpes/wobbegong, perhaps, though
it's odd that it only happens on HEAD runs.

regards, tom lane

In response to

Re: stress test for parallel workers at 2019-07-23 23:48:57 from Thomas Munro

Responses

Re: stress test for parallel workers at 2019-08-06 23:57:23 from Thomas Munro

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Andres Freund	2019-07-24 05:15:21	Re: Change atoi to strtol in same place
Previous Message	Paul A Jungwirth	2019-07-24 05:13:07	Re: range_agg