From: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
---|---|
To: | Craig Ringer <craig(dot)ringer(at)enterprisedb(dot)com> |
Cc: | Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Petr Jelinek <petr(dot)jelinek(at)enterprisedb(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com> |
Subject: | Re: Single transaction in the tablesync worker? |
Date: | 2020-12-07 08:51:11 |
Message-ID: | CAA4eK1JYoxoa=GdQ73G1Ohz+b2jfAECFFAHtvkj-+qPJKUsNgA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Dec 7, 2020 at 9:21 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Mon, Dec 7, 2020 at 6:20 AM Craig Ringer
> <craig(dot)ringer(at)enterprisedb(dot)com> wrote:
> >
>
> >>
> >> I am not sure why but it seems acceptable to original authors that the
> >> data of transactions are visibly partially during the initial
> >> synchronization phase for a subscription.
> >
> >
> > I don't think there's much alternative there.
> >
>
> I am not sure about this. I think it is primarily to allow some more
> parallelism among apply and sync workers. One primitive way to achieve
> parallelism and don't have this problem is to allow apply worker to
> wait till all the tablesync workers are in DONE state.
>
As the slot of apply worker is created before all the tablesync
workers it should never miss any LSN which tablesync workers would
have processed. Also, the table sync workers should not process any
xact if the apply worker has not processed anything. I think tablesync
currently always processes one transaction (because we call
process_sync_tables at commit of a txn) even if that is not required
to be in sync with the apply worker. This should solve both the
problems (a) visibility of partial transactions (b) allow prepared
transactions because tablesync worker no longer needs to combine
multiple transactions data.
I think the other advantages of this would be that it would reduce the
load (both CPU and I/O) on the publisher-side by allowing to decode
the data only once instead of for each table sync worker once and
separately for the apply worker. I think it will use fewer resources
to finish the work.
Is there any flaw in this idea which I am missing?
--
With Regards,
Amit Kapila.
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Kapila | 2020-12-07 09:00:17 | Re: Logical archiving |
Previous Message | vignesh C | 2020-12-07 08:32:22 | Re: Parallel copy |