Re: BUG #17716: walsender process hang while decoding 'DROP PUBLICATION' XLOG

From: shveta malik <shveta(dot)malik(at)gmail(dot)com>
To: Bowen Shi <zxwsbg12138(at)gmail(dot)com>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org, shveta malik <shveta(dot)malik(at)gmail(dot)com>
Subject: Re: BUG #17716: walsender process hang while decoding 'DROP PUBLICATION' XLOG
Date: 2022-12-20 12:02:47
Message-ID: CAJpy0uDD4k=GU6YjRKt83cZDNuvkTbOV8OL6dZiRtvFQbhMJGA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hello,
I tried to reproduce the lag with a bigger magnitude test case i.e. added more
tables to pub_t_large to increase command_ids and added huge number of tables
to working publication pub_t to increase the number of entries in
rel-cache, but no luck.
No noticeable lag observed on HEAD with the new mechanism of invalidation.

thanks
Shveta

On Tue, Dec 20, 2022 at 11:40 AM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
>
> Hello,
> The idea looks good to me. For 'relation schema cache (pgoutput one)', on receiving invalidation msg for one hash-value, we invalidate the complete cache as there is no way to find an entry corresponding to that hash-value and thus your fix-proposal will make good difference. But I feel it makes sense on HEAD as well.
>
> This complete cache invalidation happens multiple times even on HEAD (10k times for the given case). This cache is mostly empty in given test-case, but consider the case where we have huge number of publications and subscriptions (to make this cache have huge number of entries) and then we try to drop 1 large publication with say 40k-50k tables, in that case we might see slowness while traversing and invalidating the concerned cache on HEAD as well. The test case with increased magnitude can be tried for HEAD once to see if we need it on HEAD or not.
>
> thanks
> Shveta
>
>
> On Mon, Dec 19, 2022 at 5:52 PM Bowen Shi <zxwsbg12138(at)gmail(dot)com> wrote:
>>
>> Hello,
>> Thanks for your advice. I make some tests and this problem can't be
>> reproduced in PG 14+ version. I think adding a new XLOG type will help
>> resolve this problem. But I think the following patch may be helpful
>> in the PG 13 version.
>>
>> The invalidation contains two parts: pgoutput and relfilenodeMap. We
>> have no way to optimize relfilenodeMap part , since it has been
>> discussed in previous mails
>> https://www.postgresql.org/message-id/CANDwggKYveEtXjXjqHA6RL3AKSHMsQyfRY6bK+NqhAWJyw8psQ@mail.gmail.com.
>>
>> However, I'd like to contribute a patch to fix pgoutput part. We can skip
>> invalidating caches after first time with a lazy tag and this works.
>> It almost doubles the walsender performance while decoding this XLOG.
>>
>> I use the test in the last email and reduce the number of relations in
>> publications to 1000, the test result is following:
>>
>> Before optimization: 76 min
>> After optimization: 35 min
>>
>> Though the result is not good enough, I think this patch is still worthy.

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Robins Tharakan 2022-12-20 12:48:19 Re: BUG #17725: Sefault when seg_in() called with a large argument
Previous Message John Naylor 2022-12-20 10:13:55 Re: BUG #17725: Sefault when seg_in() called with a large argument