From: | Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> |
---|---|
To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Petr Jelinek <petr(dot)jelinek(at)enterprisedb(dot)com> |
Cc: | "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Rahila Syed <rahilasyed90(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com> |
Subject: | Re: Column Filtering in Logical Replication |
Date: | 2022-04-18 12:36:00 |
Message-ID: | f54fc646-6bc1-b218-9831-8159ce145234@enterprisedb.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 4/18/22 13:04, Amit Kapila wrote:
> On Thu, Apr 14, 2022 at 9:09 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>>
>> On Thu, Apr 14, 2022 at 8:32 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>>>>
>>>> The other part of the puzzle is the below check in the code:
>>>> /*
>>>> * If we reached the sync worker limit per subscription, just exit
>>>> * silently as we might get here because of an otherwise harmless race
>>>> * condition.
>>>> */
>>>> if (nsyncworkers >= max_sync_workers_per_subscription)
>>>>
>>>> It is not clear to me why this check is there, if this wouldn't be
>>>> there, the user would have got either a WARNING to increase the
>>>> max_logical_replication_workers or the apply worker would have been
>>>> restarted. Do you have any idea about this?
>>>
>>> Yeah, I'm also puzzled with this check. It seems that this function
>>> doesn't work well when the apply worker is not running and some
>>> tablesync workers are running. I initially thought that the apply
>>> worker calls to this function as many as tables that needs to be
>>> synced, but it checks the max_sync_workers_per_subscription limit
>>> before calling to logicalrep_worker_launch(). So I'm not really sure
>>> we need this check.
>>>
>>
>> I just hope that the original author Petr J. responds to this point. I
>> have added him to this email. This will help us to find the best
>> solution for this problem.
>>
>
> I did some more investigation for this code. It is added by commit [1]
> and the patch that led to this commit is first time posted on -hackers
> in email [2]. Now, neither the commit message nor the patch (comments)
> gives much idea as to why this part of code is added but I think there
> is some hint in the email [2]. In particular, read the paragraph in
> the email [2] that has the lines: ".... and limiting sync workers per
> subscription theoretically wasn't either (although I don't think it
> could happen in practice).".
>
> It seems that this check has been added to theoretically limit the
> sync workers even though that can't happen because apply worker
> ensures that before trying to launch the sync worker. Does this theory
> make sense to me? If so, I think we can change the check as: "if
> (OidIsValid(relid) && nsyncworkers >=
> max_sync_workers_per_subscription)" in launcher.c. This will serve the
> purpose of the original code and will solve the issue being discussed
> here. I think we can even backpatch this. What do you think?
>
Sounds reasonable to me. It's unfortunate there's no explanation of what
exactly is the commit message fixing (and why), but I doubt anyone will
remember the details after 5 years.
+1 to backpatching, I consider this to be a bug
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2022-04-18 13:22:00 | Re: TRAP: FailedAssertion("HaveRegisteredOrActiveSnapshot()", File: "toast_internals.c", Line: 670, PID: 19403) |
Previous Message | 汪洋 | 2022-04-18 11:37:16 | subscribe hackers |