Re: Logical Replica ReorderBuffer Size Accounting Issues

From: Alex Richman <alexrichman(at)onesignal(dot)com>
To: "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>, Niels Stevens <niels(dot)stevens(at)onesignal(dot)com>
Subject: Re: Logical Replica ReorderBuffer Size Accounting Issues
Date: 2023-01-13 12:15:00
Message-ID: CAMnUB3pARWPi0Gq6ZYOKvfkNGOAU9xTYq1R69e37T=qdxD9WJg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Fri, 13 Jan 2023 at 11:17, wangw(dot)fnst(at)fujitsu(dot)com <wangw(dot)fnst(at)fujitsu(dot)com>
wrote:

> I think I reproduced this problem as you suggested
> (Update the entire table in parallel). And I can reproduce this problem on
> both
> current HEAD and REL_15_1. The memory used in rb->tup_context can reach
> 350M
> in HEAD and reach 600MB in REL_15_1.
>
Great, thanks for your help in reproducing this.

> But there's one more thing I'm not sure about. You mentioned in [2] that
> pg_stat_replication_slots shows 0 spilled or streamed bytes for any slots.
> I
> think this may be due to the timing of viewing pg_stat_replication_slots.
> In
> the function ReorderBufferCheckMemoryLimit , after invoking the function
> ReorderBufferSerializeTXN, even without actually freeing any used memory in
> rb->tup_context, I could see spilled-related record in
> pg_stat_replication_slots. Could you please help to confirm this point if
> possible?
>
So on the local reproduction using the test scripts we have in the last two
emails, I do see some streamed bytes on the test slot. However in
production I still see 0 streamed or spilled bytes, and the walsenders
there regularly reach some gigabytes of RSS. I think it is the same root
bug but with a far greater scale in production (millions of tiny updates
instead of 16 large ones). I should also note that in production we have
~40 subscriptions/walsenders rather than 1 in the test reproduction
here, so there's a lot of extra CPU churning through the work.

Thanks for your continued analysis of the GenerationAlloc/Free stuff - I'm
afraid I'm out of my depth there but let me know if you need any more
information on reproducing the issue or testing patches etc.

Thanks,
- Alex.

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2023-01-13 15:01:24 Re: Crash during backend start when low on memory
Previous Message Alvaro Herrera 2023-01-13 11:53:52 Re: Crash during backend start when low on memory