Re: Memory leak in plpython3u (with testcase and patch)

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Mat Arye <mat(at)timescale(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Memory leak in plpython3u (with testcase and patch)
Date: 2025-01-10 21:37:56
Message-ID: 2619755.1736545076@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Mat Arye <mat(at)timescale(dot)com> writes:
> I found a memory leak in plpython3u. It happens when converting the
> argument for an spi plan execution (via plpy.execute() or similar). I've
> attached a sql file to create two plpython3u functions to reproduce the
> issue (memory_leak_test.sql).

I see the leak all right, and your patch does seem to fix it, so
I believe your diagnosis that PLy_output_convert() is what's leaking.

> I believe the issue is in the call to `PLy_output_convert` inside
> `PLy_spi_execute_plan`. I've attached a patch that reduces the memory usage
> to ~70MB from ~720 MB when using test_mem. I based what I did on what
> `PLy_input_convert` does. But I am not an expert in this part of the code
> so I am not sure the patch is correct. In particular, I don't fully grasp
> the comment in `PLy_input_convert` about why the scratch is reset before
> not after the conversion cycle and what that has to do with python
> refcounts. A review by an expert would be appreciated.

The corresponding concern here would be that some pass-by-reference
conversion result Datum could come back verbatim as an output of
SPI_execute_plan, and we mustn't reset the scratch context again
before we're done with using the Datum. Sadly, I don't think it
will work if that happens, because what we do as soon as
SPI_execute_plan finishes is to call PLy_spi_execute_fetch_result,
and that calls PLy_input_from_tuple, which is one of the things
that will zap the scratch context. So this doesn't feel very safe
... and indeed it falls over in plpython's existing regression tests.

However, I don't really see why we need to use that scratch context.
PLy_spi_execute_plan runs a subtransaction, so we could perfectly well
use the subtransaction's CurTransactionContext. (IOW, this wouldn't
even be an issue if PLy_spi_subtransaction_begin didn't forcibly
switch back to oldcontext.)

A couple of other points:

* PLy_spi_execute_plan already has an outer-level variable named
oldcontext, so this coding will draw complaints about a shadowed
local variable in compilers that are picky about that. But
I don't think you need the inner oldcontext variable anyway, since you
know perfectly well that the previous context is the outer oldcontext.
Just switch back to that after using CurTransactionContext.

* Looking around at other callers of PLy_output_convert, it seems
likely that PLy_cursor_plan() has this same problem. (I think we're
okay though with the calls from PLy_exec_function and
PLy_modify_tuple, since those could only happen right before return
from the plpython function; there's no path to iterate them enough
times to create a meaningful leak.)

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Melanie Plageman 2025-01-10 22:09:22 Re: Moving the vacuum GUCs' docs out of the Client Connection Defaults section
Previous Message Nathan Bossart 2025-01-10 20:46:45 Re: [PATCH] Hex-coding optimizations using SVE on ARM.