Re: Single transaction in the tablesync worker?

From: Craig Ringer <craig(dot)ringer(at)enterprisedb(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Petr Jelinek <petr(dot)jelinek(at)enterprisedb(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>
Subject: Re: Single transaction in the tablesync worker?
Date: 2020-12-07 00:50:31
Message-ID: CAGRY4nxnZhOm_QwUHdUsJKq80QwwncdtGR7EEYs1mUm-L8+MtQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, 5 Dec 2020, 10:03 Amit Kapila, <amit(dot)kapila16(at)gmail(dot)com> wrote:

> On Fri, Dec 4, 2020 at 7:12 PM Ashutosh Bapat
> <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> wrote:
> >
> > On Thu, Dec 3, 2020 at 7:24 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> wrote:
> > >
> > > On Thu, Dec 3, 2020 at 7:04 PM Ashutosh Bapat
> > > <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> wrote:
> > > >
> > > > On Thu, Dec 3, 2020 at 2:55 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> wrote:
> > > > >
> > > > > The tablesync worker in logical replication performs the table data
> > > > > sync in a single transaction which means it will copy the initial
> data
> > > > > and then catch up with apply worker in the same transaction. There
> is
> > > > > a comment in LogicalRepSyncTableStart ("We want to do the table
> data
> > > > > sync in a single transaction.") saying so but I can't find the
> > > > > concrete theory behind the same. Is there any fundamental problem
> if
> > > > > we commit the transaction after initial copy and slot creation in
> > > > > LogicalRepSyncTableStart and then allow the apply of transactions
> as
> > > > > it happens in apply worker? I have tried doing so in the attached
> (a
> > > > > quick prototype to test) and didn't find any problems with
> regression
> > > > > tests. I have tried a few manual tests as well to see if it works
> and
> > > > > didn't find any problem. Now, it is quite possible that it is
> > > > > mandatory to do the way we are doing currently, or maybe something
> > > > > else is required to remove this requirement but I think we can do
> > > > > better with respect to comments in this area.
> > > >
> > > > If we commit the initial copy, the data upto the initial copy's
> > > > snapshot will be visible downstream. If we apply the changes by
> > > > committing changes per transaction, the data visible to the other
> > > > transactions will differ as the apply progresses.
> > > >
> > >
> > > It is not clear what you mean by the above. The way you have written
> > > appears that you are saying that instead of copying the initial data,
> > > I am saying to copy it transaction-by-transaction. But that is not the
> > > case. I am saying copy the initial data by using REPEATABLE READ
> > > isolation level as we are doing now, commit it and then process
> > > transaction-by-transaction till we reach sync-point (point till where
> > > apply worker has already received the data).
> >
> > Craig in his mail has clarified this. The changes after the initial
> > COPY will be visible before the table sync catches up.
> >
>
> I think the problem is not that the changes are visible after COPY
> rather it is that we don't have a mechanism to restart if it crashes
> after COPY unless we do all the sync up in one transaction. Assume we
> commit after COPY and then process transaction-by-transaction and it
> errors out (due to connection loss) or crashes, in-between one of the
> following transactions after COPY then after the restart we won't know
> from where to start for that relation. This is because the catalog
> (pg_subscription_rel) will show the state as 'd' (data is being
> copied) and the slot would have gone as it was a temporary slot. But
> as mentioned in one of my emails above [1] we can solve these problems
> which Craig also seems to be advocating for as there are many
> advantages of not doing the entire sync (initial copy + stream changes
> for that relation) in one single transaction. It will allow us to
> support decode of prepared xacts in the subscriber. Also, it seems
> pglogical already does processing transaction-by-transaction after the
> initial copy. The only thing which is not clear to me is why we
> haven't decided to go ahead initially and it would be probably better
> if the original authors would also chime-in to at least clarify the
> same.
>

It's partly a resource management issue.

Replication origins are a limited resource. We need to use a replication
origin for any sync we want to be durable across restarts.

Then again so are slots and we use temp slots for each sync.

If a sync fails cleanup on the upstream side is simple with a temp slot.
With persistent slots we have more risk of creating upstream issues. But
then, so long as the subscriber exists it can deal with that. And if the
subscriber no longer exists its primary slot is an issue too.

It'd help if we could register pg_shdepend entries between catalog entries
and slots, and from a main subscription slot to any extra slots used for
resynchronization.

And I should write a patch for a resource retention summarisation view.

> I am not sure why but it seems acceptable to original authors that the
> data of transactions are visibly partially during the initial
> synchronization phase for a subscription.

I don't think there's much alternative there.

Pg would need some kind of cross commit visibility control mechanism that
separates durable commit from visibility

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2020-12-07 01:15:08 Re: Hybrid Hash/Nested Loop joins and caching results from subplans
Previous Message Bharath Rupireddy 2020-12-07 00:37:05 Re: Parallel Inserts in CREATE TABLE AS