Re: Memory leak from ExecutorState context?

From: Jehan-Guillaume de Rorthais <jgdr(at)dalibo(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Memory leak from ExecutorState context?
Date: 2023-03-02 12:08:38
Message-ID: 20230302130838.717e888d@karst
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 2 Mar 2023 01:30:27 +0100
Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
> On 3/2/23 00:18, Jehan-Guillaume de Rorthais wrote:
> >>> ExecHashIncreaseNumBatches was really chatty, having hundreds of thousands
> >>> of calls, always short-cut'ed to 1048576, I guess because of the
> >>> conditional block «/* safety check to avoid overflow */» appearing early
> >>> in this function.
> >[...] But what about this other short-cut then?
> >
> > /* do nothing if we've decided to shut off growth */
> > if (!hashtable->growEnabled)
> > return;
> >
> > [...]
> >
> > /*
> > * If we dumped out either all or none of the tuples in the table,
> > * disable
> > * further expansion of nbatch. This situation implies that we have
> > * enough tuples of identical hashvalues to overflow spaceAllowed.
> > * Increasing nbatch will not fix it since there's no way to subdivide
> > * the
> > * group any more finely. We have to just gut it out and hope the server
> > * has enough RAM.
> > */
> > if (nfreed == 0 || nfreed == ninmemory)
> > {
> > hashtable->growEnabled = false;
> > #ifdef HJDEBUG
> > printf("Hashjoin %p: disabling further increase of nbatch\n",
> > hashtable);
> > #endif
> > }
> >
> > If I guess correctly, the function is not able to split the current batch,
> > so it sits and hopes. This is a much better suspect and I can surely track
> > this from gdb.
>
> Yes, this would make much more sense - it'd be consistent with the
> hypothesis that this is due to number of batches exploding (it's a
> protection exactly against that).
>
> You specifically mentioned the other check earlier, but now I realize
> you've been just speculating it might be that.

Yes, sorry about that, I jumped on this speculation without actually digging it
much...

[...]
> But I have another idea - put a breakpoint on makeBufFile() which is the
> bit that allocates the temp files including the 8kB buffer, and print in
> what context we allocate that. I have a hunch we may be allocating it in
> the ExecutorState. That'd explain all the symptoms.

That what I was wondering as well yesterday night.

So, on your advice, I set a breakpoint on makeBufFile:

(gdb) info br
Num Type Disp Enb Address What
1 breakpoint keep y 0x00000000007229df in makeBufFile
bt 10
p CurrentMemoryContext.name

Then, I disabled it and ran the query up to this mem usage:

VIRT RES SHR S %CPU %MEM
20.1g 7.0g 88504 t 0.0 22.5

Then, I enabled the breakpoint and look at around 600 bt and context name
before getting bored. They **all** looked like that:

Breakpoint 1, BufFileCreateTemp (...) at buffile.c:201
201 in buffile.c
#0 BufFileCreateTemp (...) buffile.c:201
#1 ExecHashJoinSaveTuple (tuple=0x1952c180, ...) nodeHashjoin.c:1238
#2 ExecHashJoinImpl (parallel=false, pstate=0x31a6418) nodeHashjoin.c:398
#3 ExecHashJoin (pstate=0x31a6418) nodeHashjoin.c:584
#4 ExecProcNodeInstr (node=<optimized out>) execProcnode.c:462
#5 ExecProcNode (node=0x31a6418)
#6 ExecSort (pstate=0x31a6308)
#7 ExecProcNodeInstr (node=<optimized out>)
#8 ExecProcNode (node=0x31a6308)
#9 fetch_input_tuple (aggstate=aggstate(at)entry=0x31a5ea0)

$421643 = 0x99d7f7 "ExecutorState"

These 600-ish 8kB buffer were all allocated in "ExecutorState". I could
probably log much more of them if more checks/stats need to be collected, but
it really slow down the query a lot, granting it only 1-5% of CPU time instead
of the usual 100%.

So It's not exactly a leakage, as memory would be released at the end of the
query, but I suppose they should be allocated in a shorter living context,
to avoid this memory bloat, am I right?

> BTW with how many batches does the hash join start?

* batches went from 32 to 1048576 before being growEnabled=false as suspected
* original and current nbuckets were set to 1048576 immediately
* allowed space is set to the work_mem, but current space usage is 1.3GB, as
measured previously close before system refuse more memory allocation.

Here are the full details about the hash associated with the previous backtrace:

(gdb) up
(gdb) up
(gdb) p *((HashJoinState*)pstate)->hj_HashTable
$421652 = {
nbuckets = 1048576,
log2_nbuckets = 20,
nbuckets_original = 1048576,
nbuckets_optimal = 1048576,
log2_nbuckets_optimal = 20,
buckets = {unshared = 0x68f12e8, shared = 0x68f12e8},
keepNulls = true,
skewEnabled = false,
skewBucket = 0x0,
skewBucketLen = 0,
nSkewBuckets = 0,
skewBucketNums = 0x0,
nbatch = 1048576,
curbatch = 0,
nbatch_original = 32,
nbatch_outstart = 1048576,
growEnabled = false,
totalTuples = 19541735,
partialTuples = 19541735,
skewTuples = 0,
innerBatchFile = 0xdfcd168,
outerBatchFile = 0xe7cd1a8,
outer_hashfunctions = 0x68ed3a0,
inner_hashfunctions = 0x68ed3f0,
hashStrict = 0x68ed440,
spaceUsed = 1302386440,
spaceAllowed = 67108864,
spacePeak = 1302386440,
spaceUsedSkew = 0,
spaceAllowedSkew = 1342177,
hashCxt = 0x68ed290,
batchCxt = 0x68ef2a0,
chunks = 0x251f28e88,
current_chunk = 0x0,
area = 0x0,
parallel_state = 0x0,
batches = 0x0,
current_chunk_shared = 1103827828993
}

For what it worth, contexts are:

(gdb) p ((HashJoinState*)pstate)->hj_HashTable->hashCxt.name
$421657 = 0x99e3c0 "HashTableContext"

(gdb) p ((HashJoinState*)pstate)->hj_HashTable->batchCxt.name
$421658 = 0x99e3d1 "HashBatchContext"

Regards,

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2023-03-02 12:14:44 Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Previous Message Bharath Rupireddy 2023-03-02 11:52:40 Re: Make some xlogreader messages more accurate