From: | Jehan-Guillaume de Rorthais <jgdr(at)dalibo(dot)com> |
---|---|
To: | Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> |
Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: Memory leak from ExecutorState context? |
Date: | 2023-03-02 12:08:38 |
Message-ID: | 20230302130838.717e888d@karst |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, 2 Mar 2023 01:30:27 +0100
Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
> On 3/2/23 00:18, Jehan-Guillaume de Rorthais wrote:
> >>> ExecHashIncreaseNumBatches was really chatty, having hundreds of thousands
> >>> of calls, always short-cut'ed to 1048576, I guess because of the
> >>> conditional block «/* safety check to avoid overflow */» appearing early
> >>> in this function.
> >[...] But what about this other short-cut then?
> >
> > /* do nothing if we've decided to shut off growth */
> > if (!hashtable->growEnabled)
> > return;
> >
> > [...]
> >
> > /*
> > * If we dumped out either all or none of the tuples in the table,
> > * disable
> > * further expansion of nbatch. This situation implies that we have
> > * enough tuples of identical hashvalues to overflow spaceAllowed.
> > * Increasing nbatch will not fix it since there's no way to subdivide
> > * the
> > * group any more finely. We have to just gut it out and hope the server
> > * has enough RAM.
> > */
> > if (nfreed == 0 || nfreed == ninmemory)
> > {
> > hashtable->growEnabled = false;
> > #ifdef HJDEBUG
> > printf("Hashjoin %p: disabling further increase of nbatch\n",
> > hashtable);
> > #endif
> > }
> >
> > If I guess correctly, the function is not able to split the current batch,
> > so it sits and hopes. This is a much better suspect and I can surely track
> > this from gdb.
>
> Yes, this would make much more sense - it'd be consistent with the
> hypothesis that this is due to number of batches exploding (it's a
> protection exactly against that).
>
> You specifically mentioned the other check earlier, but now I realize
> you've been just speculating it might be that.
Yes, sorry about that, I jumped on this speculation without actually digging it
much...
[...]
> But I have another idea - put a breakpoint on makeBufFile() which is the
> bit that allocates the temp files including the 8kB buffer, and print in
> what context we allocate that. I have a hunch we may be allocating it in
> the ExecutorState. That'd explain all the symptoms.
That what I was wondering as well yesterday night.
So, on your advice, I set a breakpoint on makeBufFile:
(gdb) info br
Num Type Disp Enb Address What
1 breakpoint keep y 0x00000000007229df in makeBufFile
bt 10
p CurrentMemoryContext.name
Then, I disabled it and ran the query up to this mem usage:
VIRT RES SHR S %CPU %MEM
20.1g 7.0g 88504 t 0.0 22.5
Then, I enabled the breakpoint and look at around 600 bt and context name
before getting bored. They **all** looked like that:
Breakpoint 1, BufFileCreateTemp (...) at buffile.c:201
201 in buffile.c
#0 BufFileCreateTemp (...) buffile.c:201
#1 ExecHashJoinSaveTuple (tuple=0x1952c180, ...) nodeHashjoin.c:1238
#2 ExecHashJoinImpl (parallel=false, pstate=0x31a6418) nodeHashjoin.c:398
#3 ExecHashJoin (pstate=0x31a6418) nodeHashjoin.c:584
#4 ExecProcNodeInstr (node=<optimized out>) execProcnode.c:462
#5 ExecProcNode (node=0x31a6418)
#6 ExecSort (pstate=0x31a6308)
#7 ExecProcNodeInstr (node=<optimized out>)
#8 ExecProcNode (node=0x31a6308)
#9 fetch_input_tuple (aggstate=aggstate(at)entry=0x31a5ea0)
$421643 = 0x99d7f7 "ExecutorState"
These 600-ish 8kB buffer were all allocated in "ExecutorState". I could
probably log much more of them if more checks/stats need to be collected, but
it really slow down the query a lot, granting it only 1-5% of CPU time instead
of the usual 100%.
So It's not exactly a leakage, as memory would be released at the end of the
query, but I suppose they should be allocated in a shorter living context,
to avoid this memory bloat, am I right?
> BTW with how many batches does the hash join start?
* batches went from 32 to 1048576 before being growEnabled=false as suspected
* original and current nbuckets were set to 1048576 immediately
* allowed space is set to the work_mem, but current space usage is 1.3GB, as
measured previously close before system refuse more memory allocation.
Here are the full details about the hash associated with the previous backtrace:
(gdb) up
(gdb) up
(gdb) p *((HashJoinState*)pstate)->hj_HashTable
$421652 = {
nbuckets = 1048576,
log2_nbuckets = 20,
nbuckets_original = 1048576,
nbuckets_optimal = 1048576,
log2_nbuckets_optimal = 20,
buckets = {unshared = 0x68f12e8, shared = 0x68f12e8},
keepNulls = true,
skewEnabled = false,
skewBucket = 0x0,
skewBucketLen = 0,
nSkewBuckets = 0,
skewBucketNums = 0x0,
nbatch = 1048576,
curbatch = 0,
nbatch_original = 32,
nbatch_outstart = 1048576,
growEnabled = false,
totalTuples = 19541735,
partialTuples = 19541735,
skewTuples = 0,
innerBatchFile = 0xdfcd168,
outerBatchFile = 0xe7cd1a8,
outer_hashfunctions = 0x68ed3a0,
inner_hashfunctions = 0x68ed3f0,
hashStrict = 0x68ed440,
spaceUsed = 1302386440,
spaceAllowed = 67108864,
spacePeak = 1302386440,
spaceUsedSkew = 0,
spaceAllowedSkew = 1342177,
hashCxt = 0x68ed290,
batchCxt = 0x68ef2a0,
chunks = 0x251f28e88,
current_chunk = 0x0,
area = 0x0,
parallel_state = 0x0,
batches = 0x0,
current_chunk_shared = 1103827828993
}
For what it worth, contexts are:
(gdb) p ((HashJoinState*)pstate)->hj_HashTable->hashCxt.name
$421657 = 0x99e3c0 "HashTableContext"
(gdb) p ((HashJoinState*)pstate)->hj_HashTable->batchCxt.name
$421658 = 0x99e3d1 "HashBatchContext"
Regards,
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Kapila | 2023-03-02 12:14:44 | Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher |
Previous Message | Bharath Rupireddy | 2023-03-02 11:52:40 | Re: Make some xlogreader messages more accurate |