From: | Gunther <raj(at)gusw(dot)net> |
---|---|
To: | Justin Pryzby <pryzby(at)telsasoft(dot)com>, Gunther <raj(at)gusw(dot)net> |
Cc: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, pgsql-performance(at)lists(dot)postgresql(dot)org |
Subject: | Re: Out of Memory errors are frustrating as heck! |
Date: | 2019-04-20 20:00:18 |
Message-ID: | e3457109-4500-6c59-10fb-0fa4405cc9e4@gusw.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
On Tue, Apr 16, 2019 at 11:46:51PM -0500, Justin Pryzby wrote:
>> I wonder if it'd be useful to compile with
>> ./configure CFLAGS=-DHJDEBUG=1
> Could you try this, too ?
OK, doing it now, here is what I'm getting in the log file now. I am
surprised I get so few rows here when there
2019-04-20 17:12:16.077 UTC [7093] LOG: database system was shut down at 2019-04-20 17:12:15 UTC
2019-04-20 17:12:16.085 UTC [7091] LOG: database system is ready to accept connections
Hashjoin 0x118e0c8: initial nbatch = 1, nbuckets = 1024
Hashjoin 0x118e0f8: initial nbatch = 1, nbuckets = 1024
Hashjoin 0x1194e78: initial nbatch = 1, nbuckets = 65536
Hashjoin 0x119b518: initial nbatch = 16, nbuckets = 65536
Hashjoin 0x1194e78: initial nbatch = 1, nbuckets = 65536
Hashjoin 0x119bb38: initial nbatch = 16, nbuckets = 65536
TopMemoryContext: 4347672 total in 9 blocks; 41784 free (19 chunks); 4305888 used
HandleParallelMessages: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
pgstat TabStatusArray lookup hash table: 8192 total in 1 blocks; 1456 free (0 chunks); 6736 used
TopTransactionContext: 8192 total in 1 blocks; 5416 free (2 chunks); 2776 used
Operator lookup cache: 24576 total in 2 blocks; 10760 free (3 chunks); 13816 used
TableSpace cache: 8192 total in 1 blocks; 2096 free (0 chunks); 6096 used
Type information cache: 24352 total in 2 blocks; 2624 free (0 chunks); 21728 used
RowDescriptionContext: 8192 total in 1 blocks; 6896 free (0 chunks); 1296 used
MessageContext: 32768 total in 3 blocks; 13488 free (1 chunks); 19280 used
Operator class cache: 8192 total in 1 blocks; 560 free (0 chunks); 7632 used
smgr relation table: 32768 total in 3 blocks; 16832 free (8 chunks); 15936 used
TransactionAbortContext: 32768 total in 1 blocks; 32512 free (0 chunks); 256 used
Portal hash: 8192 total in 1 blocks; 560 free (0 chunks); 7632 used
TopPortalContext: 8192 total in 1 blocks; 7664 free (0 chunks); 528 used
PortalHoldContext: 24632 total in 2 blocks; 7392 free (0 chunks); 17240 used
PortalContext: 1474560 total in 183 blocks; 6152 free (8 chunks); 1468408 used:
ExecutorState: 2234501600 total in 266274 blocks; 3696112 free (17274 chunks); 2230805488 used
HashTableContext: 32768 total in 3 blocks; 17272 free (8 chunks); 15496 used
HashBatchContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
HashTableContext: 8192 total in 1 blocks; 7320 free (0 chunks); 872 used
HashBatchContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
HashTableContext: 8192 total in 1 blocks; 7320 free (0 chunks); 872 used
HashBatchContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
HashTableContext: 8192 total in 1 blocks; 7752 free (0 chunks); 440 used
HashBatchContext: 90288 total in 4 blocks; 16072 free (6 chunks); 74216 used
HashTableContext: 8192 total in 1 blocks; 7624 free (0 chunks); 568 used
HashBatchContext: 90288 total in 4 blocks; 16072 free (6 chunks); 74216 used
TupleSort main: 286912 total in 8 blocks; 246792 free (39 chunks); 40120 used
TupleSort main: 286912 total in 8 blocks; 246792 free (39 chunks); 40120 used
ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
HashTableContext: 8454256 total in 6 blocks; 64848 free (32 chunks); 8389408 used
HashBatchContext: 177003344 total in 5387 blocks; 7936 free (0 chunks); 176995408 used
TupleSort main: 452880 total in 8 blocks; 126248 free (27 chunks); 326632 used
Caller tuples: 1048576 total in 8 blocks; 21608 free (14 chunks); 1026968 used
ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used
Relcache by OID: 16384 total in 2 blocks; 3512 free (2 chunks); 12872 used
CacheMemoryContext: 1101328 total in 14 blocks; 236384 free (1 chunks); 864944 used
index info: 2048 total in 2 blocks; 680 free (1 chunks); 1368 used: pg_toast_2619_index
index info: 2048 total in 2 blocks; 968 free (1 chunks); 1080 used: entity_id_fkidx
index info: 2048 total in 2 blocks; 696 free (1 chunks); 1352 used: entity_id_idx
index info: 2048 total in 2 blocks; 968 free (1 chunks); 1080 used: act_id_fkidx
...
index info: 2048 total in 2 blocks; 680 free (1 chunks); 1368 used: pg_attribute_relid_attnum_index
index info: 2048 total in 2 blocks; 952 free (1 chunks); 1096 used: pg_class_oid_index
WAL record construction: 49768 total in 2 blocks; 6368 free (0 chunks); 43400 used
PrivateRefCount: 8192 total in 1 blocks; 2624 free (0 chunks); 5568 used
MdSmgr: 8192 total in 1 blocks; 6152 free (1 chunks); 2040 used
LOCALLOCK hash: 16384 total in 2 blocks; 4600 free (2 chunks); 11784 used
Timezones: 104120 total in 2 blocks; 2624 free (0 chunks); 101496 used
ErrorContext: 8192 total in 1 blocks; 7936 free (4 chunks); 256 used
Grand total: 2430295856 bytes in 272166 blocks; 5223104 free (17571 chunks); 2425072752 used
2019-04-20 17:28:56.887 UTC [7100] ERROR: out of memory
2019-04-20 17:28:56.887 UTC [7100] DETAIL: Failed on request of size 32800 in memory context "HashBatchContext".
2019-04-20 17:28:56.887 UTC [7100] STATEMENT: explain analyze SELECT * FROM reports.v_BusinessOperation;
There are amazingly few entries in the log.
(gdb) break ExecHashIncreaseNumBatches
Breakpoint 1 at 0x6b7bd6: file nodeHash.c, line 884.
(gdb) cont
Continuing.
Breakpoint 1, ExecHashIncreaseNumBatches (hashtable=0x12d0818) at nodeHash.c:884
884 int oldnbatch = hashtable->nbatch;
(gdb) cont
Continuing.
Breakpoint 1, ExecHashIncreaseNumBatches (hashtable=0x12d0818) at nodeHash.c:884
884 int oldnbatch = hashtable->nbatch;
(gdb) cont
Continuing.
Breakpoint 1, ExecHashIncreaseNumBatches (hashtable=0x12d0818) at nodeHash.c:884
884 int oldnbatch = hashtable->nbatch;
(gdb) cont 100
Will ignore next 99 crossings of breakpoint 1. Continuing.
But I am surprised now to find that the behavior has changed or what?
First, weirdly enough, now I am not getting any of the HJDEBUG messages
any more. And yet my resident memory has already increased to 1.1 GB.
7100 postgres 20 0 2797036 1.1g 188064 R 17.9 14.9 14:46.00 postgres: postgres integrator [local] EXPLAIN
7664 postgres 20 0 1271788 16228 14408 D 17.6 0.2 0:01.96 postgres: parallel worker for PID 7100
7665 postgres 20 0 1271788 16224 14404 R 17.6 0.2 0:01.95 postgres: parallel worker for PID 7100
so why is it all different now? Are we chasing a red herring?
Finally I had another stop now at the HJDEBUG line
#0 ExecHashIncreaseNumBatches (hashtable=0x12d0818) at nodeHash.c:904
#1 0x00000000006b93b1 in ExecHashTableInsert (hashtable=0x12d0818, slot=0x12bcf98, hashvalue=234960700) at nodeHash.c:1655
#2 0x00000000006bd600 in ExecHashJoinNewBatch (hjstate=0x12a4340) at nodeHashjoin.c:1051
#3 0x00000000006bc999 in ExecHashJoinImpl (pstate=0x12a4340, parallel=false) at nodeHashjoin.c:539
#4 0x00000000006bca23 in ExecHashJoin (pstate=0x12a4340) at nodeHashjoin.c:565
#5 0x00000000006a191f in ExecProcNodeInstr (node=0x12a4340) at execProcnode.c:461
#6 0x00000000006ceaad in ExecProcNode (node=0x12a4340) at ../../../src/include/executor/executor.h:247
#7 0x00000000006cebe7 in ExecSort (pstate=0x12a4230) at nodeSort.c:107
#8 0x00000000006a191f in ExecProcNodeInstr (node=0x12a4230) at execProcnode.c:461
#9 0x00000000006a18f0 in ExecProcNodeFirst (node=0x12a4230) at execProcnode.c:445
#10 0x00000000006cf25c in ExecProcNode (node=0x12a4230) at ../../../src/include/executor/executor.h:247
#11 0x00000000006cf388 in ExecUnique (pstate=0x12a4040) at nodeUnique.c:73
#12 0x00000000006a191f in ExecProcNodeInstr (node=0x12a4040) at execProcnode.c:461
#13 0x00000000006a18f0 in ExecProcNodeFirst (node=0x12a4040) at execProcnode.c:445
#14 0x000000000069728b in ExecProcNode (node=0x12a4040) at ../../../src/include/executor/executor.h:247
#15 0x0000000000699790 in ExecutePlan (estate=0x12a3da0, planstate=0x12a4040, use_parallel_mode=true, operation=CMD_SELECT,
sendTuples=true, numberTuples=0, direction=ForwardScanDirection, dest=0xe811c0 <donothingDR>, execute_once=true)
at execMain.c:1723
#16 0x0000000000697757 in standard_ExecutorRun (queryDesc=0x1404168, direction=ForwardScanDirection, count=0, execute_once=true)
at execMain.c:364
#17 0x00000000006975f4 in ExecutorRun (queryDesc=0x1404168, direction=ForwardScanDirection, count=0, execute_once=true)
at execMain.c:307
#18 0x000000000060d227 in ExplainOnePlan (plannedstmt=0x1402588, into=0x0, es=0x10b03a8,
queryString=0x10866c0 "explain analyze SELECT * FROM reports.v_BusinessOperation;", params=0x0, queryEnv=0x0,
planduration=0x7fff56a0df00) at explain.c:535
So now I am back with checking who calls AllocSetAlloc?
(gdb) break AllocSetAlloc if (int)strcmp(context->name, "ExecutorState") == 0
Breakpoint 4 at 0x9eab11: file aset.c, line 719.
(gdb) cont
Continuing.
Breakpoint 4, AllocSetAlloc (context=0x12a3c90, size=8272) at aset.c:719
719 AllocSet set = (AllocSet) context;
(gdb) bt 5
#0 AllocSetAlloc (context=0x12a3c90, size=8272) at aset.c:719
#1 0x00000000009f2e47 in palloc (size=8272) at mcxt.c:938
#2 0x000000000082ae84 in makeBufFileCommon (nfiles=1) at buffile.c:116
#3 0x000000000082af14 in makeBufFile (firstfile=34029) at buffile.c:138
#4 0x000000000082b09f in BufFileCreateTemp (interXact=false) at buffile.c:201
(More stack frames follow...)
(gdb) bt 7
#0 AllocSetAlloc (context=0x12a3c90, size=8272) at aset.c:719
#1 0x00000000009f2e47 in palloc (size=8272) at mcxt.c:938
#2 0x000000000082ae84 in makeBufFileCommon (nfiles=1) at buffile.c:116
#3 0x000000000082af14 in makeBufFile (firstfile=34029) at buffile.c:138
#4 0x000000000082b09f in BufFileCreateTemp (interXact=false) at buffile.c:201
#5 0x00000000006bda31 in ExecHashJoinSaveTuple (tuple=0x86069b8, hashvalue=234960700, fileptr=0x8616a30) at nodeHashjoin.c:1220
#6 0x00000000006b80c0 in ExecHashIncreaseNumBatches (hashtable=0x12d0818) at nodeHash.c:1004
(More stack frames follow...)
(gdb) cont
Continuing.
Breakpoint 4, AllocSetAlloc (context=0x12a3c90, size=8) at aset.c:719
719 AllocSet set = (AllocSet) context;
(gdb) bt 7
#0 AllocSetAlloc (context=0x12a3c90, size=8) at aset.c:719
#1 0x00000000009f2f5d in palloc0 (size=8) at mcxt.c:969
#2 0x000000000082aea2 in makeBufFileCommon (nfiles=1) at buffile.c:119
#3 0x000000000082af14 in makeBufFile (firstfile=34029) at buffile.c:138
#4 0x000000000082b09f in BufFileCreateTemp (interXact=false) at buffile.c:201
#5 0x00000000006bda31 in ExecHashJoinSaveTuple (tuple=0x86069b8, hashvalue=234960700, fileptr=0x8616a30) at nodeHashjoin.c:1220
#6 0x00000000006b80c0 in ExecHashIncreaseNumBatches (hashtable=0x12d0818) at nodeHash.c:1004
(More stack frames follow...)
(gdb) cont
Continuing.
Breakpoint 4, AllocSetAlloc (context=0x12a3c90, size=4) at aset.c:719
719 AllocSet set = (AllocSet) context;
(gdb) bt 7
#0 AllocSetAlloc (context=0x12a3c90, size=4) at aset.c:719
#1 0x00000000009f2e47 in palloc (size=4) at mcxt.c:938
#2 0x000000000082af22 in makeBufFile (firstfile=34029) at buffile.c:140
#3 0x000000000082b09f in BufFileCreateTemp (interXact=false) at buffile.c:201
#4 0x00000000006bda31 in ExecHashJoinSaveTuple (tuple=0x86069b8, hashvalue=234960700, fileptr=0x8616a30) at nodeHashjoin.c:1220
#5 0x00000000006b80c0 in ExecHashIncreaseNumBatches (hashtable=0x12d0818) at nodeHash.c:1004
#6 0x00000000006b93b1 in ExecHashTableInsert (hashtable=0x12d0818, slot=0x12bcf98, hashvalue=234960700) at nodeHash.c:1655
(More stack frames follow...)
(gdb) cont
Continuing.
Breakpoint 4, AllocSetAlloc (context=0x12a3c90, size=375) at aset.c:719
719 AllocSet set = (AllocSet) context;
(gdb) bt 7
#0 AllocSetAlloc (context=0x12a3c90, size=375) at aset.c:719
#1 0x00000000009f2e47 in palloc (size=375) at mcxt.c:938
#2 0x00000000006bdbec in ExecHashJoinGetSavedTuple (hjstate=0x12a4340, file=0x13ca418, hashvalue=0x7fff56a0da54, tupleSlot=0x12bcf98)
at nodeHashjoin.c:1277
#3 0x00000000006bd61f in ExecHashJoinNewBatch (hjstate=0x12a4340) at nodeHashjoin.c:1042
#4 0x00000000006bc999 in ExecHashJoinImpl (pstate=0x12a4340, parallel=false) at nodeHashjoin.c:539
#5 0x00000000006bca23 in ExecHashJoin (pstate=0x12a4340) at nodeHashjoin.c:565
#6 0x00000000006a191f in ExecProcNodeInstr (node=0x12a4340) at execProcnode.c:461
(More stack frames follow...)
Now I am adding a breakpoint in AllocSetFree and I see that this gets
called right after AllocSetAlloc with the same context, and presumably
the pointer previously allocated.
(gdb) finish
Run till exit from #0 AllocSetAlloc (context=0x12a3c90, size=375) at aset.c:719
0x00000000009f2e47 in palloc (size=375) at mcxt.c:938
938 ret = context->methods->alloc(context, size);
Value returned is $1 = (void *) 0x11bd858
(gdb) cont
Continuing.
Breakpoint 5, AllocSetFree (context=0x12a3c90, pointer=0x11bf748) at aset.c:992
992 AllocSet set = (AllocSet) context;
(gdb) cont
Continuing.
Breakpoint 4, AllocSetAlloc (context=0x12a3c90, size=375) at aset.c:719
719 AllocSet set = (AllocSet) context;
(gdb) finish
Run till exit from #0 AllocSetAlloc (context=0x12a3c90, size=375) at aset.c:719
0x00000000009f2e47 in palloc (size=375) at mcxt.c:938
938 ret = context->methods->alloc(context, size);
Value returned is $2 = (void *) 0x11bf748
(gdb) cont
Continuing.
Breakpoint 5, AllocSetFree (context=0x12a3c90, pointer=0x11bd858) at aset.c:992
992 AllocSet set = (AllocSet) context;
See, now that pointer allocated before is being freed. I can already see
how I would write a memory leak detection tool. I would keep a cache of
a fixed size, say one page, of recently allocated pointers, and when
free is called, it would remove the pointer from the cache. Then I would
only log the allocated pointer once it has to be evicted from the cache
because another one needs to be added, so I don't fill my log file with
a bunch of memory allocations that get freed relatively quickly.
This is very confused. I just stepped over
Breakpoint 2, ExecHashIncreaseNumBatches (hashtable=0x12d0818) at nodeHash.c:904
904 printf("Hashjoin %p: increasing nbatch to %d because space = %zu\n",
(gdb) next
908 oldcxt = MemoryContextSwitchTo(hashtable->hashCxt);
(gdb) call MemoryContextStats(TopPortalContext)
and checked my log file and there was nothing before the call
MemoryContextStats(TopPortalContext) so I don't understand where this
printf stuff is ending up. Clearly *some* of it is in the log, but none
of that "increasing nbatch" stuff is. I think there is something wrong
with that HJDEBUG stuff. Oops, my calling that MemoryContextStats really
seems to help force this buffer to be flushed, because now I got more:
Hashjoin 0x118e4c8: initial nbatch = 1, nbuckets = 1024
Hashjoin 0x118e4f8: initial nbatch = 1, nbuckets = 1024
Hashjoin 0x1195278: initial nbatch = 1, nbuckets = 65536
Hashjoin 0x119b918: initial nbatch = 16, nbuckets = 65536
Hashjoin 0x1195278: initial nbatch = 1, nbuckets = 65536
Hashjoin 0x119b918: initial nbatch = 16, nbuckets = 65536
...
Hashjoin 0x13a8e68: initial nbatch = 16, nbuckets = 8192
Hashjoin 0x13a8e68: increasing nbatch to 32 because space = 4128933
Hashjoin 0x13a8e68: freed 148 of 10584 tuples, space now 4071106
Hashjoin 0x13a8e68: increasing nbatch to 64 because space = 4128826
Hashjoin 0x13a8e68: freed 544 of 10584 tuples, space now 3916296
Hashjoin 0x13a8e68: increasing nbatch to 128 because space = 4128846
Hashjoin 0x13a8e68: freed 10419 of 10585 tuples, space now 65570
Hashjoin 0x13a8e68: increasing nbatch to 256 because space = 4128829
Hashjoin 0x13a8e68: freed 10308 of 10734 tuples, space now 161815
Hashjoin 0x13a8e68: increasing nbatch to 512 because space = 4128908
Hashjoin 0x13a8e68: freed 398 of 10379 tuples, space now 3977787
Hashjoin 0x13a8e68: increasing nbatch to 1024 because space = 4129008
Hashjoin 0x13a8e68: freed 296 of 10360 tuples, space now 4013423
Hashjoin 0x13a8e68: increasing nbatch to 2048 because space = 4129133
Hashjoin 0x13a8e68: freed 154 of 10354 tuples, space now 4068786
Hashjoin 0x13a8e68: increasing nbatch to 4096 because space = 4129035
Hashjoin 0x13a8e68: freed 10167 of 10351 tuples, space now 72849
Hashjoin 0x242c9b0: initial nbatch = 1, nbuckets = 1024
Hashjoin 0x2443ee0: initial nbatch = 1, nbuckets = 1024
Hashjoin 0x2443aa0: initial nbatch = 1, nbuckets = 1024
Hashjoin 0x2443440: initial nbatch = 1, nbuckets = 65536
Hashjoin 0x2443330: initial nbatch = 16, nbuckets = 65536
Hashjoin 0x13a8e68: increasing nbatch to 8192 because space = 4128997
Hashjoin 0x12d0818: freed 10555 of 10560 tuples, space now 1983
Hashjoin 0x12d0818: increasing nbatch to 16384 because space = 4128957
Hashjoin 0x12d0818: freed 10697 of 10764 tuples, space now 25956TopPortalContext: 8192 total in 1 blocks; 7664 free (0 chunks); 528 used
...
And you see here clearly that there is a problem with this printf stdout
buffer.
Now finally I am at this stage of the breakpoint firing rapidly
(gdb) cont 100
Will ignore next 99 crossings of breakpoint 1. Continuing.
Breakpoint 1, ExecHashIncreaseNumBatches (hashtable=0x12d0818) at nodeHash.c:884
884 int oldnbatch = hashtable->nbatch;
(gdb) bt 7
#0 ExecHashIncreaseNumBatches (hashtable=0x12d0818) at nodeHash.c:884
#1 0x00000000006b93b1 in ExecHashTableInsert (hashtable=0x12d0818, slot=0x12bcf98, hashvalue=2161368) at nodeHash.c:1655
#2 0x00000000006bd600 in ExecHashJoinNewBatch (hjstate=0x12a4340) at nodeHashjoin.c:1051
#3 0x00000000006bc999 in ExecHashJoinImpl (pstate=0x12a4340, parallel=false) at nodeHashjoin.c:539
#4 0x00000000006bca23 in ExecHashJoin (pstate=0x12a4340) at nodeHashjoin.c:565
#5 0x00000000006a191f in ExecProcNodeInstr (node=0x12a4340) at execProcnode.c:461
#6 0x00000000006ceaad in ExecProcNode (node=0x12a4340) at ../../../src/include/executor/executor.h:247
(More stack frames follow...)
so it's not a red herring after all. But nothing new in the logfile from
that HJDEBUG. Trying to flush stdout with this call
MemoryContextStats(TopPortalContext)
...
ExecutorState: 782712360 total in 92954 blocks; 3626888 free (3126 chunks); 779085472 used
and nothing heard from HJDEBUG. So you really can't rely on that HJDEBUG
stuff. It's not working.
I will summarize that the memory problem with the rapid firing on
ExecHashIncreaseNumBatches is still occurring, confirmed as I reported
earlier, it wasn't some red herring but that the HJDEBUG stuff doesn't
work. Also, that there might be benefit in creating a little resident
memory leak detector that keeps track of a single page of AllocSetAlloc
pointers cancelled out by matching AllocSetFree and reporting the
overflow with a quick little stack trace.
On 4/20/2019 6:53, Justin Pryzby wrote:
>> The only problem is that I can't test that this actually would trigger the
>> memory problem, because I can't force the plan to use the right join, it
>> always reverts to the left join hashing the tmp_q:
> I think the table on the "OUTER" side is the one which needs to be iterated
> over (not hashed), in order to return each of its rows even if there are no
> join partners in the other table. In your original query, the small table was
> being hashed and the large table iterated; maybe that's whats important.
May be so. Trying to wrap my head around the RIGHT vs. LEFT outer join
and why there even is a difference though.
-> Hash Right Join (cost=4255031.53..5530808.71 rows=34619 width=1197)
Hash Cond: (((q.documentinternalid)::text = (documentinformationsubject.documentinternalid)::text) AND ((r.targetinternalid)::text = (documentinformationsubject.actinternalid)::text))
-> Hash Right Join (cost=1341541.37..2612134.36 rows=13 width=341)
Hash Cond: (((documentinformationsubject_2.documentinternalid)::text = (q.documentinternalid)::text) AND ((documentinformationsubject_2.actinternalid)::text = (q.actinternalid)::text))
... from the TINY table tmp_q, the estimate of 13 rows ain't bad
-> Hash (cost=2908913.87..2908913.87 rows=34619 width=930)
-> Gather Merge (cost=2892141.40..2908913.87 rows=34619 width=930)
.... from the HUGE tmp_r table, with 100 million rows (estimate of 34619 is grossly wrong, but how could it know? ...
Now in a right join, we include all rows from the right table, and only
those from the left table that match the join key. I wonder why not
transform all of those to left joins then?
Why are we not hashing only the "optional" side?
The plan here seems to tell me without a doubt that it is hashing the
big table, 100 million rows get hashed into the hash join and then we
iterate over the tiny table and then ... now my mind boggles and I just
don't know why there are right joins at all.
But I see the plan is running the Hash index of 100 million rows.
>> which is of course much better, but when tmp_q and tmp_r are the results of
>> complex stuff that the planner can't estimate, then it gets it wrong, and
>> then the issue gets triggered because we are hashing on the big tmp_r, not
>> tmp_q.
> I was able to get something maybe promising ? "Batches: 65536 (originally 1)"
>
> I didn't get "Out of memory" error yet, but did crash the server with this one:
> postgres=# explain analyze WITH v AS( SELECT * FROM generate_series(1,99999999)i WHERE i%10<10 AND i%11<11 AND i%12<12 AND i%13<13 AND i%14<14 AND i%15<15 AND i%16<16 AND i%17<17 AND i%18<18 AND i%19<19 AND i%20<20 AND i%21<21 ) SELECT * FROM generate_series(1,99)k JOIN v ON k=i ;
>
> Note, on pg12dev this needs to be "with v AS MATERIALIZED".
>
> postgres=# SET work_mem='128kB';SET client_min_messages =log;SET log_statement_stats=on;explain(analyze,timing off) WITH v AS( SELECT * FROM generate_series(1,999999)i WHERE i%10<10 AND i%11<11 AND i%12<12 AND i%13<13 AND i%14<14 AND i%15<15 AND i%16<16 AND i%17<17 AND i%18<18 AND i%19<19 AND i%20<20 AND i%21<21 ) SELECT * FROM generate_series(1,99)k JOIN v ON k=i ;
> Hash Join (cost=70.04..83.84 rows=5 width=8) (actual rows=99 loops=1)
> Hash Cond: (k.k = v.i)
> CTE v
> -> Function Scan on generate_series i (cost=0.00..70.00 rows=1 width=4) (actual rows=999999 loops=1)
> Filter: (((i % 10) < 10) AND ((i % 11) < 11) AND ((i % 12) < 12) AND ((i % 13) < 13) AND ((i % 14) < 14) AND ((i % 15) < 15) AND ((i % 16) < 16) AND ((i % 17) < 17) AND ((i % 18) < 18) AND ((i % 19) < 19) AND ((i % 20) < 20) AND ((i % 21) < 21))
> -> Function Scan on generate_series k (cost=0.00..10.00 rows=1000 width=4) (actual rows=99 loops=1)
> -> Hash (cost=0.02..0.02 rows=1 width=4) (actual rows=999999 loops=1)
> Buckets: 4096 (originally 1024) Batches: 512 (originally 1) Memory Usage: 101kB
> -> CTE Scan on v (cost=0.00..0.02 rows=1 width=4) (actual rows=999999 loops=1)
Yes I thought that with CTEs and functions one might be able to generate
a test case, but still not seing how you can trick the planner into this
peculiar jointype JOIN_RIGHT and whether that is requisite for
triggering the problem.
Finally, I have tried to make a pstate pretty printer in explain.c:
void
DumpPlanState(PlanState *pstate) {
ExplainState *es = NewExplainState();
ExplainNode(pstate, NIL, NULL, NULL, es);
puts(es->str->data);
pfree(es->str->data);
}
but that didn't work, because unfortunately that ExplainNode function is
destructive. It would be so nice to refactor this explain code such that
there would be a completely conservative function that simply dumps the
present pstate with all the information about its estimate and actual
situation, how many iterations it has already accomplished, how many it
estimates to still have to do, whether its original estimate panned out
or not, etc. This would be so tremendously useful for runtime debugging
of queries. I think the utility of this can hardly be overstated. I mean
even for end-user applications of some data warehouse, where you could
probe a long running query every 5 seconds as to where the execution is.
Man, I could not think of any more low hanging fruit useful feature. I
am sure that if PostgreSQL was originally written in Java, this feature
would naturally exist already.
regards and Happy Easter,
-Gunther
From | Date | Subject | |
---|---|---|---|
Next Message | Tomas Vondra | 2019-04-20 20:01:34 | Re: Out of Memory errors are frustrating as heck! |
Previous Message | Justin Pryzby | 2019-04-20 19:30:09 | Re: Out of Memory errors are frustrating as heck! |