Re: Out of Memory errors are frustrating as heck!

From: Gunther <raj(at)gusw(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-performance(at)lists(dot)postgresql(dot)org, Justin Pryzby <pryzby(at)telsasoft(dot)com>
Subject: Re: Out of Memory errors are frustrating as heck!
Date: 2019-04-16 06:33:19
Message-ID: 867396ca-9df7-c873-b095-5df85eaafb71@gusw.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

It is confirmed, these two call paths are the only ones. At least
probably the only ones to occur with enough of a frequency.

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
28576 postgres 20 0 2695304 1.0g 200764 R 11.3 13.8 4:20.13 postgres: postgres integrator [local] EXPLAIN
28580 postgres 20 0 646616 432784 36968 S 98.7 5.5 8:53.28 gdb -p 28576

there is a problem with gdb, it also has a memoy leak and is very
expensive with the checking of my conditional breakpoint. So I can't run
it all the way through.

Also here captured with

(gdb) call MemoryContextStats(TopPortalContext)

TopPortalContext: 8192 total in 1 blocks; 7664 free (0 chunks); 528 used
PortalHoldContext: 24632 total in 2 blocks; 7392 free (0 chunks); 17240 used
PortalContext: 1482752 total in 184 blocks; 11216 free (8 chunks); 1471536 used:
ExecutorState: 1369337168 total in 163397 blocks; 248840 free (36 chunks); 1369088328 used
HashTableContext: 32768 total in 3 blocks; 17304 free (10 chunks); 15464 used
HashBatchContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
HashTableContext: 8192 total in 1 blocks; 7320 free (0 chunks); 872 used
HashBatchContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
HashTableContext: 8192 total in 1 blocks; 7320 free (0 chunks); 872 used
HashBatchContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
HashTableContext: 8192 total in 1 blocks; 7752 free (0 chunks); 440 used
HashBatchContext: 90288 total in 4 blocks; 16072 free (6 chunks); 74216 used
HashTableContext: 8192 total in 1 blocks; 7624 free (0 chunks); 568 used
HashBatchContext: 90288 total in 4 blocks; 16072 free (6 chunks); 74216 used
TupleSort main: 32824 total in 2 blocks; 144 free (0 chunks); 32680 used
Caller tuples: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
HashTableContext: 8454256 total in 6 blocks; 64848 free (32 chunks); 8389408 used
HashBatchContext: 106640 total in 3 blocks; 7936 free (0 chunks); 98704 used
TupleSort main: 452880 total in 8 blocks; 126248 free (27 chunks); 326632 used
Caller tuples: 4194304 total in 10 blocks; 1496136 free (20 chunks); 2698168 used ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
...
ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
Grand total: 1384601904 bytes in 163660 blocks; 2303840 free (145 chunks); 1382298064 used

there is the developing memory leak. Now let's see if we can trace
individual increments ...

(gdb) info break
Num Type Disp Enb Address What
1 breakpoint keep y 0x0000000000849030 in AllocSetAlloc at aset.c:718
stop only if (int)strcmp(context->name, "ExecutorState") == 0 && *(int *)$rsp != 0x84e7dd && 0x84e8ad != *(int *)$rsp
breakpoint already hit 4 times
(gdb) delete 1
(gdb) break AllocSetAlloc if (int)strcmp(context->name, "ExecutorState") == 0 && *(int *)$rsp != 0x84e7dd
Breakpoint 2 at 0x849030: file aset.c, line 718.
(gdb) cont
Continuing.
^CError in testing breakpoint condition:
Quit

Breakpoint 2, AllocSetAlloc (context=0x2a1d190, size=381) at aset.c:718
718 {
(gdb) bt 4
#0 AllocSetAlloc (context=0x2a1d190, size=381) at aset.c:718
#1 0x000000000084e7dd in palloc (size=381) at mcxt.c:938
#2 0x00000000006101bc in ExecHashJoinGetSavedTuple (file=file(at)entry=0x4b4a198, hashvalue=hashvalue(at)entry=0x7ffcbf92fe5c,
tupleSlot=0x2ae0ab8, hjstate=0x2a1d920) at nodeHashjoin.c:1277
#3 0x0000000000610ca3 in ExecHashJoinNewBatch (hjstate=0x2a1d920) at nodeHashjoin.c:1042
(More stack frames follow...)
(gdb) call MemoryContextStats(TopPortalContext)

doesn't show an increase of ExecutorState total:

TopPortalContext: 8192 total in 1 blocks; 7664 free (0 chunks); 528 used
PortalHoldContext: 24632 total in 2 blocks; 7392 free (0 chunks); 17240 used
PortalContext: 1482752 total in 184 blocks; 11216 free (8 chunks); 1471536 used:
ExecutorState: 1369337168 total in 163397 blocks; 248840 free (36 chunks); 1369088328 used

exact same as before:

ExecutorState: 1369337168 total in 163397 blocks; 248840 free (36 chunks); 1369088328 used

but now we get an increase to:

ExecutorState: 1369345496 total in 163398 blocks; 248840 free (36 chunks); 1369096656 used

after I did this:

(gdb) cont
Continuing.

Breakpoint 2, AllocSetAlloc (context=0x2a1d190, size=8) at aset.c:718
718 {
(gdb) bt 4
#0 AllocSetAlloc (context=0x2a1d190, size=8) at aset.c:718
#1 0x000000000084e8ad in palloc0 (size=size(at)entry=8) at mcxt.c:969
#2 0x0000000000702b63 in makeBufFileCommon (nfiles=nfiles(at)entry=1) at buffile.c:119
#3 0x0000000000702e4c in makeBufFile (firstfile=163423) at buffile.c:138
(More stack frames follow...)
(gdb) call MemoryContextStats(TopPortalContext)

So now we have it confirmed don't we? No! No we have not! We stop at the
/entrance /of the allocate method. So when I interrupted, there was no
call yet. Then at the next stop the increase was from the previous.
Continuing ... this now is from a stop at the makeBufFileCommon

ExecutorState: 1369345496 total in 163398 blocks; 248816 free (36 chunks); 1369096680 used

And again, now stopped before

ExecutorState: 1369345496 total in 163398 blocks; 248792 free (36 chunks); 1369096704 used

ExecutorState: 1369345496 total in 163398 blocks; 248792 free (36 chunks); 1369096704 used

I don't see a growth between individual invocations. Anyway, these are
the two ways to get there:

(gdb) bt 4
#0 AllocSetAlloc (context=0x2a1d190, size=4) at aset.c:718
#1 0x000000000084e7dd in palloc (size=size(at)entry=4) at mcxt.c:938
#2 0x0000000000702e59 in makeBufFile (firstfile=163423) at buffile.c:140
#3 BufFileCreateTemp (interXact=interXact(at)entry=false) at buffile.c:201
(More stack frames follow...)
(gdb) cont
Continuing.

Breakpoint 3, AllocSetAlloc (context=0x2a1d190, size=394) at aset.c:718
718 {
(gdb) bt 3
#0 AllocSetAlloc (context=0x2a1d190, size=394) at aset.c:718
#1 0x000000000084e7dd in palloc (size=394) at mcxt.c:938
#2 0x00000000006101bc in ExecHashJoinGetSavedTuple (file=file(at)entry=0x4b4a198, hashvalue=hashvalue(at)entry=0x7ffcbf92fe5c,
tupleSlot=0x2ae0ab8, hjstate=0x2a1d920) at nodeHashjoin.c:1277
(More stack frames follow...)

But now it increased

ExecutorState: 1369353824 total in 163399 blocks; 248792 free (36 chunks); 1369105032 used

It increases every 3 times I stop at the breakpoint.

-Gunther

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2019-04-16 15:30:19 Re: Out of Memory errors are frustrating as heck!
Previous Message Gunther 2019-04-16 05:23:45 Re: Out of Memory errors are frustrating as heck!