From: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Mark Wong <mark(at)2ndquadrant(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: stress test for parallel workers |
Date: | 2019-10-11 19:41:12 |
Message-ID: | CA+hUKGKNgufn12Uh4iEh5y=JkEnUBWnLLmi8L4zwzMunFeKwSA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sat, Oct 12, 2019 at 7:56 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> This matches up with the intermittent infinite_recurse failures
> we've been seeing in the buildfarm. Those are happening across
> a range of systems, but they're (almost) all Linux-based ppc64,
> suggesting that there's a longstanding arch-specific kernel bug
> involved. For reference, I scraped the attached list of such
> failures in the last three months. I wonder whether we can get
> the attention of any kernel hackers about that.
Yeah, I don't know anything about this stuff, but I was also beginning
to wonder if something is busted in the arch-specific fault.c code
that checks if stack expansion is valid[1], in a way that fails with a
rapidly growing stack, well timed incoming signals, and perhaps
Docker/LXC (that's on Mark's systems IIUC, not sure about the ARM
boxes that failed or if it could be relevant here). Perhaps the
arbitrary tolerances mentioned in that comment are relevant.
[1] https://github.com/torvalds/linux/blob/master/arch/powerpc/mm/fault.c#L244
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2019-10-11 19:48:52 | Re: Connect as multiple users using single client certificate |
Previous Message | Kyle Bateman | 2019-10-11 19:28:45 | Re: Connect as multiple users using single client certificate |