Re: Logical Replica ReorderBuffer Size Accounting Issues

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>
Cc: "Wei Wang (Fujitsu)" <wangw(dot)fnst(at)fujitsu(dot)com>, Alex Richman <alexrichman(at)onesignal(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org, Niels Stevens <niels(dot)stevens(at)onesignal(dot)com>
Subject: Re: Logical Replica ReorderBuffer Size Accounting Issues
Date: 2024-10-16 20:51:34
Message-ID: CAD21AoDG+kKNr2C3ExNj=gDVDHfLeYEbpQbkV4K0nh3ZCcOU8g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Mon, May 20, 2024 at 12:02 AM torikoshia <torikoshia(at)oss(dot)nttdata(dot)com> wrote:
>
> Hi,
>
> Thank you for working on this issue.
> It seems that we have also faced the same one.
>
> > On Wed, May 24, 2023 at 9:27 AMMasahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
> > wrote:
> >> Yes, it's because the above modification doesn't fix the memory
> >> accounting issue but only reduces memory bloat in some (extremely bad)
> >> cases. Without this modification, it was possible that the maximum
> >> actual memory usage could easily reach several tens of times as
> >> logical_decoding_work_mem (e.g. 4GB vs. 256MB as originally reported).
> >> Since the fact that the reorderbuffer doesn't account for memory
> >> fragmentation etc is still there, it's still possible that the actual
> >> memory usage could reach several times as logical_decoding_work_mem.
> >> In my environment, with reproducer.sh you shared, the total actual
> >> memory usage reached up to about 430MB while logical_decoding_work_mem
> >> being 256MB. Probably even if we use another type for memory allocator
> >> such as AllocSet a similar issue will still happen. If we don't want
> >> the reorderbuffer memory usage never to exceed
> >> logical_decoding_work_mem, we would need to change how the
> >> reorderbuffer uses and accounts for memory, which would require much
> >> work, I guess.
>
> Considering the manual says that logical_decoding_work_mem "specifies
> the maximum amount of memory to be used by logical decoding" and this
> would be easy for users to tune, it may be best to do this work.
> However..
>
> >>> One idea to deal with this issue is to choose the block sizes
> >>> carefully while measuring the performance as the comment shows:
> >>>
> >>>
> >>>
> >>> /*
> >>> * XXX the allocation sizes used below pre-date generation
> >>> context's block
> >>> * growing code. These values should likely be benchmarked and
> >>> set to
> >>> * more suitable values.
> >>> */
> >>> buffer->tup_context = GenerationContextCreate(new_ctx,
> >>> "Tuples",
> >>>
> >>> SLAB_LARGE_BLOCK_SIZE,
> >>>
> >>> SLAB_LARGE_BLOCK_SIZE,
> >>>
> >>> SLAB_LARGE_BLOCK_SIZE);
>
> since this idea can prevent the issue in not all but some situations,
> this may be good for mitigation measure.
> One concern is this would cause more frequent malloc(), but it is better
> than memory bloat, isn't it?

FYI I've just pushed the commit for fixing this memory issue to all
supported branches[1]. It's just to let people, including the
reporter, know the recent updates on this topic.

[1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=1b9b6cc3456be0f6ab929107293b31c333270bc1

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2024-10-16 21:40:23 Re: BUG #18656: "STABLE" function sometimes does not see changes
Previous Message Ba Jinsheng 2024-10-16 18:26:12 Re: Performance Issue on Query 18 of TPC-H Benchmark