From: | Jehan-Guillaume de Rorthais <jgdr(at)dalibo(dot)com> |
---|---|
To: | Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com> |
Cc: | Justin Pryzby <pryzby(at)telsasoft(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: Memory leak from ExecutorState context? |
Date: | 2023-03-27 21:13:23 |
Message-ID: | 20230327231323.08277083@karst |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On Mon, 20 Mar 2023 15:12:34 +0100
Jehan-Guillaume de Rorthais <jgdr(at)dalibo(dot)com> wrote:
> On Mon, 20 Mar 2023 09:32:17 +0100
> Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>
> > >> * Patch 1 could be rebased/applied/backpatched
> > >
> > > Would it help if I rebase Patch 1 ("move BufFile stuff into separate
> > > context")?
> >
> > Yeah, I think this is something we'd want to do. It doesn't change the
> > behavior, but it makes it easier to track the memory consumption etc.
>
> Will do this week.
Please, find in attachment a patch to allocate bufFiles in a dedicated context.
I picked up your patch, backpatch'd it, went through it and did some minor
changes to it. I have some comment/questions thought.
1. I'm not sure why we must allocate the "HashBatchFiles" new context
under ExecutorState and not under hashtable->hashCxt?
The only references I could find was in hashjoin.h:30:
/* [...]
* [...] (Exception: data associated with the temp files lives in the
* per-query context too, since we always call buffile.c in that context.)
And in nodeHashjoin.c:1243:ExecHashJoinSaveTuple() (I reworded this
original comment in the patch):
/* [...]
* Note: it is important always to call this in the regular executor
* context, not in a shorter-lived context; else the temp file buffers
* will get messed up.
But these are not explanation of why BufFile related allocations must be under
a per-query context.
2. Wrapping each call of ExecHashJoinSaveTuple() with a memory context switch
seems fragile as it could be forgotten in futur code path/changes. So I
added an Assert() in the function to make sure the current memory context is
"HashBatchFiles" as expected.
Another way to tie this up might be to pass the memory context as argument to
the function.
... Or maybe I'm over precautionary.
3. You wrote:
>> A separate BufFile memory context helps, although people won't see it
>> unless they attach a debugger, I think. Better than nothing, but I was
>> wondering if we could maybe print some warnings when the number of batch
>> files gets too high ...
So I added a WARNING when batches memory are exhausting the memory size
allowed.
+ if (hashtable->fileCxt->mem_allocated > hashtable->spaceAllowed)
+ elog(WARNING, "Growing number of hash batch is exhausting memory");
This is repeated on each call of ExecHashIncreaseNumBatches when BufFile
overflows the memory budget. I realize now I should probably add the memory
limit, the number of current batch and their memory consumption.
The message is probably too cryptic for a user. It could probably be
reworded, but some doc or additionnal hint around this message might help.
4. I left the debug messages for some more review rounds
Regards,
Attachment | Content-Type | Size |
---|---|---|
0001-Allocate-hash-batches-related-BufFile-in-a-dedicated.patch | text/x-patch | 6.7 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Justin Pryzby | 2023-03-27 21:18:41 | Re: SQL/JSON revisited |
Previous Message | Peter Geoghegan | 2023-03-27 20:51:38 | Re: pgsql: amcheck: Fix verify_heapam for tuples where xmin or xmax is 0. |