Re: Memory leak in WAL sender with pgoutput (v10~)

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Euler Taveira <euler(at)eulerto(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Memory leak in WAL sender with pgoutput (v10~)
Date: 2024-12-12 19:52:20
Message-ID: CAD21AoABd6mJwRM5Nr5LEwXP=f_TkGtYXP1jXSsSkC3Zpexq8g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Dec 11, 2024 at 9:43 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Wed, Dec 11, 2024 at 11:09 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Tue, Dec 10, 2024 at 6:13 PM Zhijie Hou (Fujitsu)
> > <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> > >
> > > On Wednesday, December 11, 2024 2:14 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > >
> > > > On Tue, Dec 10, 2024 at 1:16 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> > > > wrote:
> > > > >
> > > > > On Tue, Dec 10, 2024 at 11:24 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> > > > wrote:
> > > > > >
> > > > > > On Tue, Dec 10, 2024 at 8:54 AM vignesh C <vignesh21(at)gmail(dot)com>
> > > > wrote:
> > > > > > >
> > > > > > > On Tue, 10 Dec 2024 at 04:56, Michael Paquier <michael(at)paquier(dot)xyz>
> > > > wrote:
> > > > > > > >
> > > > > > > > On Mon, Dec 09, 2024 at 03:36:15PM +0530, Amit Kapila wrote:
> > > > > > > > > It couldn't solve the problem completely even in back-branches. The
> > > > > > > > > SQL API case I mentioned and tested by Hou-San in the email [1]
> > > > won't
> > > > > > > > > be solved.
> > > > > > > > >
> > > > > > > > > [1] -
> > > > https://www.postgresql.org/message-id/OS0PR01MB57166A4DA0ABBB94F
> > > > 2FBB28694362%40OS0PR01MB5716.jpnprd01.prod.outlook.com
> > > > > > > >
> > > > > > > > Yeah, exactly (wanted to reply exactly that yesterday but lacked time,
> > > > > > > > thanks!).
> > > > > > >
> > > > > > > Yes, that makes sense. How about something like the attached patch.
> > > > > > >
> > > > > >
> > > > > > - oldctx = MemoryContextSwitchTo(CacheMemoryContext);
> > > > > > - if (data->publications)
> > > > > > - {
> > > > > > - list_free_deep(data->publications);
> > > > > > - data->publications = NIL;
> > > > > > - }
> > > > > > + static MemoryContext pubctx = NULL;
> > > > > > +
> > > > > > + if (pubctx == NULL)
> > > > > > + pubctx = AllocSetContextCreate(CacheMemoryContext,
> > > > > > + "logical replication publication list context",
> > > > > > + ALLOCSET_SMALL_SIZES);
> > > > > > + else
> > > > > > + MemoryContextReset(pubctx);
> > > > > > +
> > > > > > + oldctx = MemoryContextSwitchTo(pubctx);
> > > > > >
> > > > > > Considering the SQL API case, why is it okay to allocate this context
> > > > > > under CacheMemoryContext?
> > > > > >
> > > > >
> > > > > On further thinking, we can't allocate it under
> > > > > LogicalDecodingContext->context because once that is freed at the end
> > > > > of SQL API pg_logical_slot_get_changes(), pubctx will be pointing to a
> > > > > dangling memory. One idea is that we use
> > > > > MemoryContextRegisterResetCallback() to invoke a reset callback
> > > > > function where we can reset pubctx but not sure if we want to go there
> > > > > in back branches. OTOH, the currently proposed fix won't leak memory
> > > > > on repeated calls to pg_logical_slot_get_changes(), so that might be
> > > > > okay as well.
> > > > >
> > > > > Thoughts?
> > > >
> > > > Alternative idea is to declare pubctx as a file static variable. And
> > > > we create the memory context under LogicalDecodingContext->context in
> > > > the startup callback and free it in the shutdown callback.
> > >
> > > I think when an ERROR occurs during the execution of the pg_logical_slot_xx()
> > > API, the shutdown callback function is not invoked. This would result in the
> > > static variable not being reset, which, I think, is why Amit mentioned the use
> > > of MemoryContextRegisterResetCallback().
> >
> > My idea is that since that new context is cleaned up together with its
> > parent context (LogicalDecodingContext->context), we unconditionally
> > set that new context to the static variable at the startup callback.
> > That being said, Amit's idea would be cleaner.
> >
>
> Your preference is not completely clear. Are you okay with the idea of
> Vignesh's currently proposed patch for back-branches, or do you prefer
> to use a memory context reset callback, or do you have a different
> idea that should be adopted for back-branches?

IIUC the current Vignesh's patch[1] doesn't solve the memory leak in
case of using logical decoding APIs, as you mentioned. I've tried the
idea of using memory context reset callback to reset pubctx. We need
to register the callback to LogicalContextDecodingContext->context,
meaning that we need to pass it to get_rel_sync_entry() (see
fix_memory_leak_v1.patch). I don't prefer this approach as it could
make backpatching complex in the future. Alternatively, we can declare
pubctx as a file static variable, create the memory context at the
startup callback, reset the pubctx at the shutdown callback, and use
the memory context reset callback to ensure the pubctx is reset (see
fix_memory_leak_v2.patch).Or I think we might not necessarily need to
use the memory context reset callback (see fix_memory_leak_v3.patch).
I prefer the latter two approaches.

Regards,

[1] https://www.postgresql.org/message-id/CALDaNm2C%3DmYNB9ZUS5t9irB2P0Tjm_r%2BnRVc717JdO%2BNtCVunw%40mail.gmail.com

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
fix_memory_leak_v2.patch application/octet-stream 3.3 KB
fix_memory_leak_v1.patch application/octet-stream 3.3 KB
fix_memory_leak_v3.patch application/octet-stream 2.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Davis 2024-12-12 19:57:28 Re: Support regular expressions with nondeterministic collations
Previous Message Jeff Davis 2024-12-12 19:22:09 Re: Remaining dependency on setlocale()