Quick Links

Re: stress test for parallel workers

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Subject:	Re: stress test for parallel workers
Date:	2019-07-23 17:28:47
Message-ID:	6762.1563902927@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Justin Pryzby <pryzby(at)telsasoft(dot)com> writes:
> Does anyone have a stress test for parallel workers ?
> On a customer's new VM, I got this several times while (trying to) migrate their DB:

> < 2019-07-23 10:33:51.552 CDT postgres >FATAL: postmaster exited during a parallel transaction

We've been seeing this irregularly in the buildfarm, too. I've been
suspicious that it's from an OOM kill on the postmaster in the
buildfarm cases, but ...

> There's nothing in dmesg nor in postgres logs.

... you'd think an OOM kill would show up in the kernel log.
(Not necessarily in dmesg, though. Did you try syslog?)

> Ideally a minimal test, since I'm apparently going to
> have to run under gdb to see how it's dying, or even what process is failing.

Like it told you, it's the postmaster that's going away.
That's Not Supposed To Happen, of course, but unfortunately Linux'
OOM kill heuristic preferentially targets the postmaster when
its children are consuming too many resources.

If that is the problem, there's some info on working around it at

https://www.postgresql.org/docs/current/kernel-resources.html#LINUX-MEMORY-OVERCOMMIT

regards, tom lane

In response to

stress test for parallel workers at 2019-07-23 16:27:03 from Justin Pryzby

Responses

Re: stress test for parallel workers at 2019-07-23 17:42:22 from Justin Pryzby

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Alvaro Herrera	2019-07-23 17:31:53	Re: [bug fix] Produce a crash dump before main() on Windows
Previous Message	Tom Lane	2019-07-23 17:21:14	Re: [PATCH] Improve performance of NOTIFY over many databases (issue blocking on AccessExclusiveLock on object 0 of class 1262 of database 0)