From: | Mark Wong <mark(at)2ndQuadrant(dot)com> |
---|---|
To: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: stress test for parallel workers |
Date: | 2019-10-11 20:28:53 |
Message-ID: | 20191011202853.GA23809@2ndQuadrant.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sat, Oct 12, 2019 at 08:41:12AM +1300, Thomas Munro wrote:
> On Sat, Oct 12, 2019 at 7:56 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > This matches up with the intermittent infinite_recurse failures
> > we've been seeing in the buildfarm. Those are happening across
> > a range of systems, but they're (almost) all Linux-based ppc64,
> > suggesting that there's a longstanding arch-specific kernel bug
> > involved. For reference, I scraped the attached list of such
> > failures in the last three months. I wonder whether we can get
> > the attention of any kernel hackers about that.
>
> Yeah, I don't know anything about this stuff, but I was also beginning
> to wonder if something is busted in the arch-specific fault.c code
> that checks if stack expansion is valid[1], in a way that fails with a
> rapidly growing stack, well timed incoming signals, and perhaps
> Docker/LXC (that's on Mark's systems IIUC, not sure about the ARM
> boxes that failed or if it could be relevant here). Perhaps the
> arbitrary tolerances mentioned in that comment are relevant.
This specific one (wobbegon) is OpenStack/KVM[2], for what it's worth...
"... cluster is an OpenStack based cluster offering POWER8 & POWER9 LE
instances running on KVM ..."
But to keep you on your toes, some of my ppc animals are Docker within
other OpenStack/KVM instance...
Regards,
Mark
[1] https://github.com/torvalds/linux/blob/master/arch/powerpc/mm/fault.c#L244
[2] https://osuosl.org/services/powerdev/
--
Mark Wong
2ndQuadrant - PostgreSQL Solutions for the Enterprise
https://www.2ndQuadrant.com/
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2019-10-11 20:30:17 | Re: let's make the list of reportable GUCs configurable (was Re: Add %r substitution for psql prompts to show recovery status) |
Previous Message | Tom Lane | 2019-10-11 20:13:46 | Re: stress test for parallel workers |