From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, mark(at)2ndquadrant(dot)com, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: stress test for parallel workers |
Date: | 2019-10-11 20:40:41 |
Message-ID: | 19525.1570826441@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Andres Freund <andres(at)anarazel(dot)de> writes:
> On 2019-10-11 14:56:41 -0400, Tom Lane wrote:
>> ... So it's really hard to explain
>> that as anything except a kernel bug: sometimes, the kernel
>> doesn't give us as much stack as it promised it would. And the
>> machine is not loaded enough for there to be any rational
>> resource-exhaustion excuse for that.
> Linux expands stack space only on demand, thus it's possible to run out
> of stack space while there ought to be stack space. Unfortunately that
> during a stack expansion, which means there's no easy place to report
> that. I've seen this be hit in production on busy machines.
As I said, this machine doesn't seem busy enough for that to be a
tenable excuse; there's nobody but me logged in, and the buildfarm
critter isn't running.
> I wonder if the machine is configured with overcommit_memory=2,
> i.e. don't overcommit. cat /proc/sys/vm/overcommit_memory would tell.
$ cat /proc/sys/vm/overcommit_memory
0
> What does grep -E '^(Mem|Commit)' /proc/meminfo show while it's
> happening?
idle:
$ grep -E '^(Mem|Commit)' /proc/meminfo
MemTotal: 2074816 kB
MemFree: 36864 kB
MemAvailable: 1779584 kB
CommitLimit: 1037376 kB
Committed_AS: 412480 kB
a few captures while regression tests are running:
$ grep -E '^(Mem|Commit)' /proc/meminfo
MemTotal: 2074816 kB
MemFree: 8512 kB
MemAvailable: 1819264 kB
CommitLimit: 1037376 kB
Committed_AS: 371904 kB
$ grep -E '^(Mem|Commit)' /proc/meminfo
MemTotal: 2074816 kB
MemFree: 32640 kB
MemAvailable: 1753792 kB
CommitLimit: 1037376 kB
Committed_AS: 585984 kB
$ grep -E '^(Mem|Commit)' /proc/meminfo
MemTotal: 2074816 kB
MemFree: 56640 kB
MemAvailable: 1695744 kB
CommitLimit: 1037376 kB
Committed_AS: 568768 kB
> What does the signal information say? You can see it with
> p $_siginfo
> after receiving the signal. A SIGSEGV here, I assume.
(gdb) p $_siginfo
$1 = {si_signo = 11, si_errno = 0, si_code = 128, _sifields = {_pad = {0 <repeats 28 times>}, _kill = {si_pid = 0, si_uid = 0},
_timer = {si_tid = 0, si_overrun = 0, si_sigval = {sival_int = 0, sival_ptr = 0x0}}, _rt = {si_pid = 0, si_uid = 0, si_sigval = {
sival_int = 0, sival_ptr = 0x0}}, _sigchld = {si_pid = 0, si_uid = 0, si_status = 0, si_utime = 0, si_stime = 0}, _sigfault = {
si_addr = 0x0}, _sigpoll = {si_band = 0, si_fd = 0}}}
> Yea, that seems like it might be good. But we have to be careful too, as
> there's some thing were do want to be interruptable from within a signal
> handler. We start some processes from within one after all...
The proposed patch has zero effect on what the signal mask will be inside
a signal handler, only on the transient state during handler entry/exit.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Chapman Flack | 2019-10-11 20:41:20 | Re: let's make the list of reportable GUCs configurable (was Re: Add %r substitution for psql prompts to show recovery status) |
Previous Message | Andres Freund | 2019-10-11 20:31:41 | Re: stress test for parallel workers |