From: | Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> |
---|---|
To: | Jakub Glapa <jakub(dot)glapa(at)gmail(dot)com> |
Cc: | Forums postgresql <pgsql-general(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com> |
Subject: | Re: ERROR: too many dynamic shared memory segments |
Date: | 2017-11-27 22:48:54 |
Message-ID: | CAEepm=0kADK5inNf_KuemjX=HQ=PuTP0DykM--fO5jS5ePVFEA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general pgsql-hackers |
On Tue, Nov 28, 2017 at 10:05 AM, Jakub Glapa <jakub(dot)glapa(at)gmail(dot)com> wrote:
> As for the crash. I dug up the initial log and it looks like a segmentation
> fault...
>
> 2017-11-23 07:26:53 CET:192.168.10.83(35238):user(at)db:[30003]: ERROR: too
> many dynamic shared memory segments
Hmm. Well this error can only occur in dsm_create() called without
DSM_CREATE_NULL_IF_MAXSEGMENTS. parallel.c calls it with that flag
and dsa.c doesn't (perhaps it should, not sure, but that'd just change
the error message), so that means this the error arose from dsa.c
trying to get more segments. That would be when Parallel Bitmap Heap
Scan tried to allocate memory.
I hacked my copy of PostgreSQL so that it allows only 5 DSM slots and
managed to reproduce a segv crash by trying to run concurrent Parallel
Bitmap Heap Scans. The stack looks like this:
* frame #0: 0x00000001083ace29
postgres`alloc_object(area=0x0000000000000000, size_class=10) + 25 at
dsa.c:1433
frame #1: 0x00000001083acd14
postgres`dsa_allocate_extended(area=0x0000000000000000, size=72,
flags=4) + 1076 at dsa.c:785
frame #2: 0x0000000108059c33
postgres`tbm_prepare_shared_iterate(tbm=0x00007f9743027660) + 67 at
tidbitmap.c:780
frame #3: 0x0000000108000d57
postgres`BitmapHeapNext(node=0x00007f9743019c88) + 503 at
nodeBitmapHeapscan.c:156
frame #4: 0x0000000107fefc5b
postgres`ExecScanFetch(node=0x00007f9743019c88,
accessMtd=(postgres`BitmapHeapNext at nodeBitmapHeapscan.c:77),
recheckMtd=(postgres`BitmapHeapRecheck at nodeBitmapHeapscan.c:710)) +
459 at execScan.c:95
frame #5: 0x0000000107fef983
postgres`ExecScan(node=0x00007f9743019c88,
accessMtd=(postgres`BitmapHeapNext at nodeBitmapHeapscan.c:77),
recheckMtd=(postgres`BitmapHeapRecheck at nodeBitmapHeapscan.c:710)) +
147 at execScan.c:162
frame #6: 0x00000001080008d1
postgres`ExecBitmapHeapScan(pstate=0x00007f9743019c88) + 49 at
nodeBitmapHeapscan.c:735
(lldb) f 3
frame #3: 0x0000000108000d57
postgres`BitmapHeapNext(node=0x00007f9743019c88) + 503 at
nodeBitmapHeapscan.c:156
153 * dsa_pointer of the iterator state which will be used by
154 * multiple processes to iterate jointly.
155 */
-> 156 pstate->tbmiterator = tbm_prepare_shared_iterate(tbm);
157 #ifdef USE_PREFETCH
158 if (node->prefetch_maximum > 0)
159
(lldb) print tbm->dsa
(dsa_area *) $3 = 0x0000000000000000
(lldb) print node->ss.ps.state->es_query_dsa
(dsa_area *) $5 = 0x0000000000000000
(lldb) f 17
frame #17: 0x000000010800363b
postgres`ExecGather(pstate=0x00007f9743019320) + 635 at
nodeGather.c:220
217 * Get next tuple, either from one of our workers, or by running the plan
218 * ourselves.
219 */
-> 220 slot = gather_getnext(node);
221 if (TupIsNull(slot))
222 return NULL;
223
(lldb) print *node->pei
(ParallelExecutorInfo) $8 = {
planstate = 0x00007f9743019640
pcxt = 0x00007f97450001b8
buffer_usage = 0x0000000108b7e218
instrumentation = 0x0000000108b7da38
area = 0x0000000000000000
param_exec = 0
finished = '\0'
tqueue = 0x0000000000000000
reader = 0x0000000000000000
}
(lldb) print *node->pei->pcxt
warning: could not load any Objective-C class information. This will
significantly reduce the quality of type information available.
(ParallelContext) $9 = {
node = {
prev = 0x000000010855fb60
next = 0x000000010855fb60
}
subid = 1
nworkers = 0
nworkers_launched = 0
library_name = 0x00007f9745000248 "postgres"
function_name = 0x00007f9745000268 "ParallelQueryMain"
error_context_stack = 0x0000000000000000
estimator = (space_for_chunks = 180352, number_of_keys = 19)
seg = 0x0000000000000000
private_memory = 0x0000000108b53038
toc = 0x0000000108b53038
worker = 0x0000000000000000
}
I think there are two failure modes: one of your sessions showed the
"too many ..." error (that's good, ran out of slots and said so and
our error machinery worked as it should), and another crashed with a
segfault, because it tried to use a NULL "area" pointer (bad). I
think this is a degenerate case where we completely failed to launch
parallel query, but we ran the parallel query plan anyway and this
code thinks that the DSA is available. Oops.
--
Thomas Munro
http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Lakes | 2017-11-27 23:48:07 | Setting a serial column with serial object that has a name that is built dynamically |
Previous Message | Jakub Glapa | 2017-11-27 21:05:39 | Re: ERROR: too many dynamic shared memory segments |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2017-11-27 22:58:23 | Re: [HACKERS] Bug in ExecModifyTable function and trigger issues for foreign tables |
Previous Message | Alexander Korotkov | 2017-11-27 21:41:19 | Re: [HACKERS] Challenges preventing us moving to 64 bit transaction id (XID)? |