Re: Conflict Detection and Resolution

From: shveta malik <shveta(dot)malik(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Jan Wieck <jan(at)wi3ck(dot)info>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>
Subject: Re: Conflict Detection and Resolution
Date: 2024-08-26 09:46:04
Message-ID: CAJpy0uA+inSWx+00TfjMPMn-FXjOaoNJv8ATmRAX+UCDzQVhCw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Aug 26, 2024 at 2:23 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Thu, Aug 22, 2024 at 3:45 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
> >
> > On Wed, Aug 21, 2024 at 4:08 PM Nisha Moond <nisha(dot)moond412(at)gmail(dot)com> wrote:
> > >
> > > The patches have been rebased on the latest pgHead following the merge
> > > of the conflict detection patch [1].
> >
> > Thanks for working on patches.
> >
> > Summarizing the issues which need some suggestions/thoughts.
> >
> > 1)
> > For subscription based resolvers, currently the syntax implemented is:
> >
> > 1a)
> > CREATE SUBSCRIPTION <subname>
> > CONNECTION <conninfo> PUBLICATION <pubname>
> > CONFLICT RESOLVER
> > (conflict_type1 = resolver1, conflict_type2 = resolver2,
> > conflict_type3 = resolver3,...);
> >
> > 1b)
> > ALTER SUBSCRIPTION <subname> CONFLICT RESOLVER
> > (conflict_type1 = resolver1, conflict_type2 = resolver2,
> > conflict_type3 = resolver3,...);
> >
> > Earlier the syntax suggested in [1] was:
> > CREATE SUBSCRIPTION <subname> CONNECTION <conninfo> PUBLICATION <pubname>
> > CONFLICT RESOLVER 'conflict_resolver1' FOR 'conflict_type1',
> > CONFLICT RESOLVER 'conflict_resolver2' FOR 'conflict_type2';
> >
> > I think the currently implemented syntax is good as it has less
> > repetition, unless others think otherwise.
> >
> > ~~
> >
> > 2)
> > For subscription based resolvers, do we need a RESET command to reset
> > resolvers to default? Any one of below or both?
> >
> > 2a) reset all at once:
> > ALTER SUBSCRIPTION <name> RESET CONFLICT RESOLVERS
> >
> > 2b) reset one at a time:
> > ALTER SUBSCRIPTION <name> RESET CONFLICT RESOLVER for 'conflict_type';
> >
> > The issue I see here is, to implement 1a and 1b, we have introduced
> > the 'RESOLVER' keyword. If we want to implement 2a, we will have to
> > introduce the 'RESOLVERS' keyword as well. But we can come up with
> > some alternative syntax if we plan to implement these. Thoughts?
> >
>
> It makes sense to have a RESET on the lines of (a) and (b). At this
> stage, we should do minimal in extending the syntax. How about RESET
> CONFLICT RESOLVER ALL for (a)?

Yes, the syntax looks good.

> > ~~
> >
> > 3) Regarding update_exists:
> >
> > 3a)
> > Currently update_exists resolver patch is kept separate. The reason
> > being, it performs resolution which will need deletion of multiple
> > rows. It will be good to discuss if we want to target this in the
> > first draft. Please see the example:
> >
> > create table tab (a int primary key, b int unique, c int unique);
> >
> > Pub: insert into tab values (1,1,1);
> > Sub:
> > insert into tab values (2,20,30);
> > insert into tab values (3,40,50);
> > insert into tab values (4,60,70);
> >
> > Pub: update tab set a=2,b=40,c=70 where a=1;
> >
> > The above 'update' on pub will result in 'update_exists' on sub and if
> > resolution is in favour of 'apply', then it will conflict with all the
> > three local rows of subscriber due to unique constraint present on all
> > three columns. Thus in order to resolve the conflict, it will have to
> > delete these 3 rows on sub:
> >
> > 2,20,30
> > 3,40,50
> > 4,60,70
> > and then update 1,1,1 to 2,40,70.
> >
> > Just need opinion on if we shall target this in the initial draft.
> >
>
> This case looks a bit complicated. It seems there is no other
> alternative than to delete the multiple rows. It is better to create a
> separate top-up patch for this and we can discuss in detail about this
> once the basic patch is in better shape.

Agreed.

>
> > 3b)
> > If we plan to implement this, we need to work on optimal design where
> > we can find all the conflicting rows at once and delete those.
> > Currently the implementation has been done using recursion i.e. find
> > one conflicting row, then delete it and then next and so on i.e. we
> > call apply_handle_update_internal() recursively. On initial code
> > review, I feel it is doable to scan all indexes at once and get
> > conflicting-tuple-ids in one go and get rid of recursion. It can be
> > attempted once we decide on 3a.
> >
>
> I suggest following the simplest strategy (even if that means calling
> the update function recursively) by adding comments on the optimal
> strategy. We can optimize it later as well.

Sure.

>
> > ~~
> >
> > 4)
> > Now for insert_exists and update_exists, we are doing a pre-scan of
> > all unique indexes to find conflict. Also there is post-scan to figure
> > out if the conflicting row is inserted meanwhile. This needs to be
> > reviewed for optimization. We need to avoid pre-scan wherever
> > possible. I think the only case for which it can be avoided is
> > 'ERROR'. For the cases where resolver is in favor of remote-apply, we
> > need to check conflict beforehand to avoid rollback of already
> > inserted data. And for the case where resolver is in favor of skipping
> > the change, then too we should know beforehand about the conflict to
> > avoid heap-insertion and rollback. Thoughts?
> >
>
> It makes sense to skip the pre-scan wherever possible. Your analysis
> sounds reasonable to me.
>
> > ~~
> >
> > 5)
> > Currently we only capture update_missing conflict i.e. we are not
> > distinguishing between the missing row and the deleted row. We had
> > discussed this in the past a couple of times. If we plan to target it
> > in draft 1, I can dig up all old emails and resume discussion on this.
> >
>
> This is a separate conflict detection project in itself. I am thinking
> about the solution to this problem. We will talk about this in a
> separate thread.
>
> > ~~
> >
> > 6)
> > Table-level resolves. There was a suggestion earlier to implement
> > table-level resolvers. The patch has been implemented to some extent,
> > it can be completed and posted when we are done reviewing subscription
> > level resolvers.
> >
>
> Yeah, it makes sense to do it after the subscription-level resolution
> patch is ready.
>
> --
> With Regards,
> Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2024-08-26 09:52:01 Re: Conflict detection and logging in logical replication
Previous Message Amit Kapila 2024-08-26 09:37:26 Re: Doc: fix the note related to the GUC "synchronized_standby_slots"