From: | Gunther <raj(at)gusw(dot)net> |
---|---|
To: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Gunther <raj(at)gusw(dot)net> |
Cc: | Justin Pryzby <pryzby(at)telsasoft(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, pgsql-performance(at)lists(dot)postgresql(dot)org |
Subject: | Re: Out of Memory errors are frustrating as heck! |
Date: | 2019-04-21 05:03:50 |
Message-ID: | f6d4f3be-44f8-0c45-de0d-e68caef9bb60@gusw.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
On 4/20/2019 21:14, Tomas Vondra wrote:
> Maybe. But before wasting any more time on the memory leak investigation,
> I suggest you first try the patch moving the BufFile allocations to a
> separate context. That'll either confirm or disprove the theory.
OK, fair enough. So, first patch 0001-* applied, recompiled and
2019-04-21 04:08:04.364 UTC [11304] LOG: server process (PID 11313) was terminated by signal 11: Segmentation fault
2019-04-21 04:08:04.364 UTC [11304] DETAIL: Failed process was running: explain analyze select * from reports.v_BusinessOperation;
2019-04-21 04:08:04.364 UTC [11304] LOG: terminating any other active server processes
2019-04-21 04:08:04.368 UTC [11319] FATAL: the database system is in recovery mode
2019-04-21 04:08:04.368 UTC [11315] WARNING: terminating connection because of crash of another server process
SIGSEGV ... and with the core dump that I have I can tell you where:
Core was generated by `postgres: postgres integrator'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00000000009f300c in palloc (size=8272) at mcxt.c:936
936 context->isReset = false;
(gdb) bt
#0 0x00000000009f300c in palloc (size=8272) at mcxt.c:936
#1 0x000000000082b068 in makeBufFileCommon (nfiles=1) at buffile.c:116
#2 0x000000000082b0f8 in makeBufFile (firstfile=73) at buffile.c:138
#3 0x000000000082b283 in BufFileCreateTemp (interXact=false) at buffile.c:201
#4 0x00000000006bdc15 in ExecHashJoinSaveTuple (tuple=0x1c5a468, hashvalue=3834163156, fileptr=0x18a3730) at nodeHashjoin.c:1227
#5 0x00000000006b9568 in ExecHashTableInsert (hashtable=0x188fb88, slot=0x1877a18, hashvalue=3834163156) at nodeHash.c:1701
#6 0x00000000006b6c39 in MultiExecPrivateHash (node=0x1862168) at nodeHash.c:186
#7 0x00000000006b6aec in MultiExecHash (node=0x1862168) at nodeHash.c:114
#8 0x00000000006a19cc in MultiExecProcNode (node=0x1862168) at execProcnode.c:501
#9 0x00000000006bc5d2 in ExecHashJoinImpl (pstate=0x17b90e0, parallel=false) at nodeHashjoin.c:290
...
(gdb) info frame
Stack level 0, frame at 0x7fffd5d4dc80:
rip = 0x9f300c in palloc (mcxt.c:936); saved rip = 0x82b068
called by frame at 0x7fffd5d4dcb0
source language c.
Arglist at 0x7fffd5d4dc70, args: size=8272
Locals at 0x7fffd5d4dc70, Previous frame's sp is 0x7fffd5d4dc80
Saved registers:
rbx at 0x7fffd5d4dc60, rbp at 0x7fffd5d4dc70, r12 at 0x7fffd5d4dc68, rip at 0x7fffd5d4dc78
and I have confirmed that this is while working the main T_HashJoin with
jointype JOIN_RIGHT.
So now I am assuming that perhaps you want both of these patches
applied. So applied it, and retried and boom, same thing same place.
turns out the MemoryContext is NULL:
(gdb) p context
$1 = (MemoryContext) 0x0
all patches applied cleanly (with the -p1 option). I see no .rej file,
but also no .orig file, not sure why that version of patch didn't create
them. But I paid attention and know that there was no error.
-Gunther
From | Date | Subject | |
---|---|---|---|
Next Message | Gunther | 2019-04-21 05:31:06 | Re: Out of Memory errors are frustrating as heck! |
Previous Message | Tomas Vondra | 2019-04-21 01:14:01 | Re: Out of Memory errors are frustrating as heck! |