Re: Memory leak in WAL sender with pgoutput (v10~)

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Subject: Re: Memory leak in WAL sender with pgoutput (v10~)
Date: 2024-12-02 01:49:34
Message-ID: Z00SLoxwnjvuYeAv@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Nov 30, 2024 at 09:28:31AM +0100, Alvaro Herrera wrote:
> I'm not sure about your proposed fix. Isn't it simpler to have another
> memory context which we can reset instead of doing list_free_deep()? It
> doesn't have to be a global memory context -- since this is not
> reentrant and not referenced anywhere else, it can be a simple static
> variable in that block, as in the attached. I ran the stock tests (no
> sysbench) and at least it doesn't crash.
>
> This should be easily backpatchable also, since there's no ABI change.

Yeah. Using more memory contexts around these API calls done without
the cache manipulation is something I've also mentioned to Amit a few
days ago, and I'm feeling that this concept should be applied to a
broader area than just the cached publication list in light of this
thread and ea792bfd93ab. That's a discussion for later, likely. I'm
not sure how broad this should be and if this should be redesign to
make the whole context tracking easier. What I am sure of at this
stage is that for this workload we don't have any garbage left behind.

> + static MemoryContext pubmemcxt = NULL;
> +
> + if (pubmemcxt == NULL)
> + pubmemcxt = AllocSetContextCreate(CacheMemoryContext,
> + "publication list context",
> + ALLOCSET_DEFAULT_SIZES);
> + else
> + MemoryContextReset(pubmemcxt);
> + oldctx = MemoryContextSwitchTo(pubmemcxt);
> +
> data->publications = LoadPublications(data->publication_names);
> MemoryContextSwitchTo(oldctx);
> publications_valid = true;

Something like that works for me in stable branches, removing the need
for the list free as there is only one caller of LoadPublications()
currently. I've checked with my previous script, with the aggressive
invalidation tweak, in the case I'm missing something, and the memory
numbers are stable.

I am slightly concerned about the current design of GetPublication()
in the long-term, TBH. LoadPublications() has hidden the leak behind
two layers of routines in the WAL sender, and that's easy to miss once
you call anything that loads a Publication depending on how the caller
caches its data. So I would still choose for modifying the structure
on HEAD removing the pstrdup() for the publication name. Anyway, your
suggestion is also OK by me on top of that, that's less conflicts in
all the branches.

May I suggest the attached then? 0001 is your backpatch, 0002 would
be my HEAD-only suggestion, if that's OK for you and others of course.
--
Michael

Attachment Content-Type Size
v2-0001-Fix-memory-leak-in-pgoutput-with-publication-list.patch text/x-diff 1.8 KB
v2-0002-Avoid-extra-pstrdup-in-Publication.patch text/x-diff 1.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2024-12-02 02:15:23 Re: CREATE SCHEMA ... CREATE DOMAIN support
Previous Message Michail Nikolaev 2024-12-02 01:39:00 Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements