Re: Conflict Detection and Resolution

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: shveta malik <shveta(dot)malik(at)gmail(dot)com>
Cc: Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Jan Wieck <jan(at)wi3ck(dot)info>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
Subject: Re: Conflict Detection and Resolution
Date: 2024-08-26 08:53:12
Message-ID: CAA4eK1KrXT2ASDKYeoGiyqnqy-5i1UrTX+2BuQGGKb4JT-Ki-w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Aug 22, 2024 at 3:45 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
>
> On Wed, Aug 21, 2024 at 4:08 PM Nisha Moond <nisha(dot)moond412(at)gmail(dot)com> wrote:
> >
> > The patches have been rebased on the latest pgHead following the merge
> > of the conflict detection patch [1].
>
> Thanks for working on patches.
>
> Summarizing the issues which need some suggestions/thoughts.
>
> 1)
> For subscription based resolvers, currently the syntax implemented is:
>
> 1a)
> CREATE SUBSCRIPTION <subname>
> CONNECTION <conninfo> PUBLICATION <pubname>
> CONFLICT RESOLVER
> (conflict_type1 = resolver1, conflict_type2 = resolver2,
> conflict_type3 = resolver3,...);
>
> 1b)
> ALTER SUBSCRIPTION <subname> CONFLICT RESOLVER
> (conflict_type1 = resolver1, conflict_type2 = resolver2,
> conflict_type3 = resolver3,...);
>
> Earlier the syntax suggested in [1] was:
> CREATE SUBSCRIPTION <subname> CONNECTION <conninfo> PUBLICATION <pubname>
> CONFLICT RESOLVER 'conflict_resolver1' FOR 'conflict_type1',
> CONFLICT RESOLVER 'conflict_resolver2' FOR 'conflict_type2';
>
> I think the currently implemented syntax is good as it has less
> repetition, unless others think otherwise.
>
> ~~
>
> 2)
> For subscription based resolvers, do we need a RESET command to reset
> resolvers to default? Any one of below or both?
>
> 2a) reset all at once:
> ALTER SUBSCRIPTION <name> RESET CONFLICT RESOLVERS
>
> 2b) reset one at a time:
> ALTER SUBSCRIPTION <name> RESET CONFLICT RESOLVER for 'conflict_type';
>
> The issue I see here is, to implement 1a and 1b, we have introduced
> the 'RESOLVER' keyword. If we want to implement 2a, we will have to
> introduce the 'RESOLVERS' keyword as well. But we can come up with
> some alternative syntax if we plan to implement these. Thoughts?
>

It makes sense to have a RESET on the lines of (a) and (b). At this
stage, we should do minimal in extending the syntax. How about RESET
CONFLICT RESOLVER ALL for (a)?

> ~~
>
> 3) Regarding update_exists:
>
> 3a)
> Currently update_exists resolver patch is kept separate. The reason
> being, it performs resolution which will need deletion of multiple
> rows. It will be good to discuss if we want to target this in the
> first draft. Please see the example:
>
> create table tab (a int primary key, b int unique, c int unique);
>
> Pub: insert into tab values (1,1,1);
> Sub:
> insert into tab values (2,20,30);
> insert into tab values (3,40,50);
> insert into tab values (4,60,70);
>
> Pub: update tab set a=2,b=40,c=70 where a=1;
>
> The above 'update' on pub will result in 'update_exists' on sub and if
> resolution is in favour of 'apply', then it will conflict with all the
> three local rows of subscriber due to unique constraint present on all
> three columns. Thus in order to resolve the conflict, it will have to
> delete these 3 rows on sub:
>
> 2,20,30
> 3,40,50
> 4,60,70
> and then update 1,1,1 to 2,40,70.
>
> Just need opinion on if we shall target this in the initial draft.
>

This case looks a bit complicated. It seems there is no other
alternative than to delete the multiple rows. It is better to create a
separate top-up patch for this and we can discuss in detail about this
once the basic patch is in better shape.

> 3b)
> If we plan to implement this, we need to work on optimal design where
> we can find all the conflicting rows at once and delete those.
> Currently the implementation has been done using recursion i.e. find
> one conflicting row, then delete it and then next and so on i.e. we
> call apply_handle_update_internal() recursively. On initial code
> review, I feel it is doable to scan all indexes at once and get
> conflicting-tuple-ids in one go and get rid of recursion. It can be
> attempted once we decide on 3a.
>

I suggest following the simplest strategy (even if that means calling
the update function recursively) by adding comments on the optimal
strategy. We can optimize it later as well.

> ~~
>
> 4)
> Now for insert_exists and update_exists, we are doing a pre-scan of
> all unique indexes to find conflict. Also there is post-scan to figure
> out if the conflicting row is inserted meanwhile. This needs to be
> reviewed for optimization. We need to avoid pre-scan wherever
> possible. I think the only case for which it can be avoided is
> 'ERROR'. For the cases where resolver is in favor of remote-apply, we
> need to check conflict beforehand to avoid rollback of already
> inserted data. And for the case where resolver is in favor of skipping
> the change, then too we should know beforehand about the conflict to
> avoid heap-insertion and rollback. Thoughts?
>

It makes sense to skip the pre-scan wherever possible. Your analysis
sounds reasonable to me.

> ~~
>
> 5)
> Currently we only capture update_missing conflict i.e. we are not
> distinguishing between the missing row and the deleted row. We had
> discussed this in the past a couple of times. If we plan to target it
> in draft 1, I can dig up all old emails and resume discussion on this.
>

This is a separate conflict detection project in itself. I am thinking
about the solution to this problem. We will talk about this in a
separate thread.

> ~~
>
> 6)
> Table-level resolves. There was a suggestion earlier to implement
> table-level resolvers. The patch has been implemented to some extent,
> it can be completed and posted when we are done reviewing subscription
> level resolvers.
>

Yeah, it makes sense to do it after the subscription-level resolution
patch is ready.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Anthonin Bonnefoy 2024-08-26 08:54:56 Re: Set query_id for query contained in utility statement
Previous Message chungui.wcg 2024-08-26 08:45:17 Re: how to log into commitfest.postgresql.org and begin review patch