RE: Memory leak in WAL sender with pgoutput (v10~)

From: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: RE: Memory leak in WAL sender with pgoutput (v10~)
Date: 2024-12-03 08:36:42
Message-ID: OS0PR01MB57166A4DA0ABBB94F2FBB28694362@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tuesday, December 3, 2024 4:28 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Tue, Dec 3, 2024 at 11:57 AM Michael Paquier <michael(at)paquier(dot)xyz>
> wrote:
> >
> > On Tue, Dec 03, 2024 at 09:50:42AM +0530, Amit Kapila wrote:
> > > But that suits the current design more. We allocate PGOutputData and
> > > other contexts in that structure in a "Logical decoding context". A
> > > few of its members (publications, publication_names) residing in
> > > totally unrelated contexts sounds odd. In the first place, we don't
> > > need to allocate publications under CacheMemoryContext, they should
> > > be allocated in PGOutputData->cachectx. However, because we need to
> > > free those entirely at one-shot during invalidation processing, we
> > > could use a new context as a child context of
> > > PGOutputData->cachectx. Unless I am missing something, the current
> > > memory context usage appears more like a coding convenience than a
> thoughtful design decision.
> >
> > PGOutputData->cachectx has been introduced in 2022 in commit
> > PGOutputData->52e4f0cd4,
> > while the decision to have RelationSyncEntry and the publication list
> > in CacheMemoryContext gets down to v10 where this logical replication
> > has been introduced. This was a carefully-thought choice back then
> > because this is data that belongs to the process cache, so yes, this
> > choice makes sense to me.
> >
>
> The parent structure (PGOutputData) was stored in the "Logical decoding
> context" even in v11. So, how does storing its member 'publications' in
> CacheMemoryContext a good idea? It is possible that we are leaking memory
> while doing decoding via SQL APIs where we free decoding context after
> getting changes though I haven't tested the same.

Right. I think I have faced this memory leak recently. It might be true for
walsender that 'publications' is a per-process content. But SQL APIs might use
different publication names each time during execution.

I can reproduce the memory leak due to allocating the publication
names under CacheMemoryContext like the following:

--
CREATE PUBLICATION pub FOR ALL TABLES;
CREATE TABLE stream_test(a int);
SELECT 'init' FROM pg_create_logical_replication_slot('isolation_slot', 'pgoutput');
INSERT INTO stream_test SELECT generate_series(1, 2, 1);

- he backend's memory usage increases with each execution of the following function
SELECT count(*) FROM pg_logical_slot_peek_binary_changes('isolation_slot', NULL, NULL, 'proto_version', '4', 'publication_names', 'pub,pub,........ <lots of pub names>');

Best Regards,
Hou zj

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Nazir Bilal Yavuz 2024-12-03 08:52:28 Re: meson missing test dependencies
Previous Message Andrei Lepikhov 2024-12-03 08:33:19 Re: An inefficient query caused by unnecessary PlaceHolderVar