From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Mat Arye <mat(at)timescale(dot)com> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Memory leak in plpython3u (with testcase and patch) |
Date: | 2025-01-10 21:37:56 |
Message-ID: | 2619755.1736545076@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Mat Arye <mat(at)timescale(dot)com> writes:
> I found a memory leak in plpython3u. It happens when converting the
> argument for an spi plan execution (via plpy.execute() or similar). I've
> attached a sql file to create two plpython3u functions to reproduce the
> issue (memory_leak_test.sql).
I see the leak all right, and your patch does seem to fix it, so
I believe your diagnosis that PLy_output_convert() is what's leaking.
> I believe the issue is in the call to `PLy_output_convert` inside
> `PLy_spi_execute_plan`. I've attached a patch that reduces the memory usage
> to ~70MB from ~720 MB when using test_mem. I based what I did on what
> `PLy_input_convert` does. But I am not an expert in this part of the code
> so I am not sure the patch is correct. In particular, I don't fully grasp
> the comment in `PLy_input_convert` about why the scratch is reset before
> not after the conversion cycle and what that has to do with python
> refcounts. A review by an expert would be appreciated.
The corresponding concern here would be that some pass-by-reference
conversion result Datum could come back verbatim as an output of
SPI_execute_plan, and we mustn't reset the scratch context again
before we're done with using the Datum. Sadly, I don't think it
will work if that happens, because what we do as soon as
SPI_execute_plan finishes is to call PLy_spi_execute_fetch_result,
and that calls PLy_input_from_tuple, which is one of the things
that will zap the scratch context. So this doesn't feel very safe
... and indeed it falls over in plpython's existing regression tests.
However, I don't really see why we need to use that scratch context.
PLy_spi_execute_plan runs a subtransaction, so we could perfectly well
use the subtransaction's CurTransactionContext. (IOW, this wouldn't
even be an issue if PLy_spi_subtransaction_begin didn't forcibly
switch back to oldcontext.)
A couple of other points:
* PLy_spi_execute_plan already has an outer-level variable named
oldcontext, so this coding will draw complaints about a shadowed
local variable in compilers that are picky about that. But
I don't think you need the inner oldcontext variable anyway, since you
know perfectly well that the previous context is the outer oldcontext.
Just switch back to that after using CurTransactionContext.
* Looking around at other callers of PLy_output_convert, it seems
likely that PLy_cursor_plan() has this same problem. (I think we're
okay though with the calls from PLy_exec_function and
PLy_modify_tuple, since those could only happen right before return
from the plpython function; there's no path to iterate them enough
times to create a meaningful leak.)
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Melanie Plageman | 2025-01-10 22:09:22 | Re: Moving the vacuum GUCs' docs out of the Client Connection Defaults section |
Previous Message | Nathan Bossart | 2025-01-10 20:46:45 | Re: [PATCH] Hex-coding optimizations using SVE on ARM. |