Re: Conflict Detection and Resolution

From: shveta malik <shveta(dot)malik(at)gmail(dot)com>
To: Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Jan Wieck <jan(at)wi3ck(dot)info>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>
Subject: Re: Conflict Detection and Resolution
Date: 2024-08-22 10:14:50
Message-ID: CAJpy0uBrXZE6LLofX5tc8WOm5F+FNgnQjRLQerOY8cOqqvtrNg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Aug 21, 2024 at 4:08 PM Nisha Moond <nisha(dot)moond412(at)gmail(dot)com> wrote:
>
> The patches have been rebased on the latest pgHead following the merge
> of the conflict detection patch [1].

Thanks for working on patches.

Summarizing the issues which need some suggestions/thoughts.

1)
For subscription based resolvers, currently the syntax implemented is:

1a)
CREATE SUBSCRIPTION <subname>
CONNECTION <conninfo> PUBLICATION <pubname>
CONFLICT RESOLVER
(conflict_type1 = resolver1, conflict_type2 = resolver2,
conflict_type3 = resolver3,...);

1b)
ALTER SUBSCRIPTION <subname> CONFLICT RESOLVER
(conflict_type1 = resolver1, conflict_type2 = resolver2,
conflict_type3 = resolver3,...);

Earlier the syntax suggested in [1] was:
CREATE SUBSCRIPTION <subname> CONNECTION <conninfo> PUBLICATION <pubname>
CONFLICT RESOLVER 'conflict_resolver1' FOR 'conflict_type1',
CONFLICT RESOLVER 'conflict_resolver2' FOR 'conflict_type2';

I think the currently implemented syntax is good as it has less
repetition, unless others think otherwise.

~~

2)
For subscription based resolvers, do we need a RESET command to reset
resolvers to default? Any one of below or both?

2a) reset all at once:
ALTER SUBSCRIPTION <name> RESET CONFLICT RESOLVERS

2b) reset one at a time:
ALTER SUBSCRIPTION <name> RESET CONFLICT RESOLVER for 'conflict_type';

The issue I see here is, to implement 1a and 1b, we have introduced
the 'RESOLVER' keyword. If we want to implement 2a, we will have to
introduce the 'RESOLVERS' keyword as well. But we can come up with
some alternative syntax if we plan to implement these. Thoughts?

~~

3) Regarding update_exists:

3a)
Currently update_exists resolver patch is kept separate. The reason
being, it performs resolution which will need deletion of multiple
rows. It will be good to discuss if we want to target this in the
first draft. Please see the example:

create table tab (a int primary key, b int unique, c int unique);

Pub: insert into tab values (1,1,1);
Sub:
insert into tab values (2,20,30);
insert into tab values (3,40,50);
insert into tab values (4,60,70);

Pub: update tab set a=2,b=40,c=70 where a=1;

The above 'update' on pub will result in 'update_exists' on sub and if
resolution is in favour of 'apply', then it will conflict with all the
three local rows of subscriber due to unique constraint present on all
three columns. Thus in order to resolve the conflict, it will have to
delete these 3 rows on sub:

2,20,30
3,40,50
4,60,70
and then update 1,1,1 to 2,40,70.

Just need opinion on if we shall target this in the initial draft.

3b)
If we plan to implement this, we need to work on optimal design where
we can find all the conflicting rows at once and delete those.
Currently the implementation has been done using recursion i.e. find
one conflicting row, then delete it and then next and so on i.e. we
call apply_handle_update_internal() recursively. On initial code
review, I feel it is doable to scan all indexes at once and get
conflicting-tuple-ids in one go and get rid of recursion. It can be
attempted once we decide on 3a.

~~

4)
Now for insert_exists and update_exists, we are doing a pre-scan of
all unique indexes to find conflict. Also there is post-scan to figure
out if the conflicting row is inserted meanwhile. This needs to be
reviewed for optimization. We need to avoid pre-scan wherever
possible. I think the only case for which it can be avoided is
'ERROR'. For the cases where resolver is in favor of remote-apply, we
need to check conflict beforehand to avoid rollback of already
inserted data. And for the case where resolver is in favor of skipping
the change, then too we should know beforehand about the conflict to
avoid heap-insertion and rollback. Thoughts?

~~

5)
Currently we only capture update_missing conflict i.e. we are not
distinguishing between the missing row and the deleted row. We had
discussed this in the past a couple of times. If we plan to target it
in draft 1, I can dig up all old emails and resume discussion on this.

~~

6)
Table-level resolves. There was a suggestion earlier to implement
table-level resolvers. The patch has been implemented to some extent,
it can be completed and posted when we are done reviewing subscription
level resolvers.

~~

[1]: https://www.postgresql.org/message-id/CAA4eK1LhD%3DC5UwDeKxC_5jK4_ADtM7g%2BMoFW9qhziSxHbVVfeQ%40mail.gmail.com

For clock-skew and timestamp based resolution, if needed, I will post
another email for the design items where suggestions are needed.

thanks
Shveta

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nishant Sharma 2024-08-22 10:30:13 Re: [PROPOSAL] : Disallow use of empty column name in (column_name '') in ALTER or CREATE of foreign table.
Previous Message Pavel Borisov 2024-08-22 10:02:21 Re: type cache cleanup improvements