Re: Conflict Detection and Resolution

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
Cc: Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Jan Wieck <jan(at)wi3ck(dot)info>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
Subject: Re: Conflict Detection and Resolution
Date: 2024-06-19 06:33:26
Message-ID: CAA4eK1KGOJ8+TSHFcTx87K8VVLpA_WRV2DkmO=Mbe4ocUtPz1w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jun 11, 2024 at 3:12 PM Ashutosh Bapat
<ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> wrote:
>
> On Sat, Jun 8, 2024 at 3:52 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>>
>> On Fri, Jun 7, 2024 at 5:39 PM Ashutosh Bapat
>> <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> wrote:
>> >
>> > On Thu, Jun 6, 2024 at 5:16 PM Nisha Moond <nisha(dot)moond412(at)gmail(dot)com> wrote:
>> >>
>> >> >
>> >>
>> >> Here are more use cases of the "earliest_timestamp_wins" resolution method:
>> >> 1) Applications where the record of first occurrence of an event is
>> >> important. For example, sensor based applications like earthquake
>> >> detection systems, capturing the first seismic wave's time is crucial.
>> >> 2) Scheduling systems, like appointment booking, prioritize the
>> >> earliest request when handling concurrent ones.
>> >> 3) In contexts where maintaining chronological order is important -
>> >> a) Social media platforms display comments ensuring that the
>> >> earliest ones are visible first.
>> >> b) Finance transaction processing systems rely on timestamps to
>> >> prioritize the processing of transactions, ensuring that the earliest
>> >> transaction is handled first
>> >
>> >
>> > Thanks for sharing examples. However, these scenarios would be handled by the application and not during replication. What we are discussing here is the timestamp when a row was updated/inserted/deleted (or rather when the transaction that updated row committed/became visible) and not a DML on column which is of type timestamp. Some implementations use a hidden timestamp column but that's different from a user column which captures timestamp of (say) an event. The conflict resolution will be based on the timestamp when that column's value was recorded in the database which may be different from the value of the column itself.
>> >
>>
>> It depends on how these operations are performed. For example, the
>> appointment booking system could be prioritized via a transaction
>> updating a row with columns emp_name, emp_id, reserved, time_slot.
>> Now, if two employees at different geographical locations try to book
>> the calendar, the earlier transaction will win.
>
>
> I doubt that it would be that simple. The application will have to intervene and tell one of the employees that their reservation has failed. It looks natural that the first one to reserve the room should get the reservation, but implementing that is more complex than resolving a conflict in the database. In fact, mostly it will be handled outside database.
>

Sure, the application needs some handling but I have tried to explain
with a simple way that comes to my mind and how it can be realized
with db involved. This is a known conflict detection method but note
that I am not insisting to have "earliest_timestamp_wins". Even, if we
want this we can have a separate discussion on this and add it later.

>>
>>
>> > If we use the transaction commit timestamp as basis for resolution, a transaction where multiple rows conflict may end up with different rows affected by that transaction being resolved differently. Say three transactions T1, T2 and T3 on separate origins with timestamps t1, t2, and t3 respectively changed rows r1, r2 and r2, r3 and r1, r4 respectively. Changes to r1 and r2 will conflict. Let's say T2 and T3 are applied first and then T1 is applied. If t2 < t1 < t3, r1 will end up with version of T3 and r2 will end up with version of T1 after applying all the three transactions.
>> >
>>
>> Are you telling the results based on latest_timestamp_wins? If so,
>> then it is correct. OTOH, if the user has configured
>> "earliest_timestamp_wins" resolution method, then we should end up
>> with a version of r1 from T1 because t1 < t3. Also, due to the same
>> reason, we should have version r2 from T2.
>>
>> >
>> Would that introduce an inconsistency between r1 and r2?
>> >
>>
>> As per my understanding, this shouldn't be an inconsistency. Won't it
>> be true even when the transactions are performed on a single node with
>> the same timing?
>>
>
> The inconsistency will arise irrespective of conflict resolution method. On a single system effects of whichever transaction runs last will be visible entirely. But in the example above the node where T1, T2, and T3 (from *different*) origins) are applied, we might end up with a situation where some changes from T1 are applied whereas some changes from T3 are applied.
>

I still think it will lead to the same result if all three T1, T2, T3
happen on the same node in the same order as you mentioned. Say, we
have a pre-existing table with rows r1, r2, r3, r4. Now, if we use the
order of transactions to be applied on the same node based on t2 < t1
< t3. First T2 will be applied, so for now, r1 is a pre-existing
version and r2 is from T2. Next, when T1 is performed, both r1 and r2
are from T1. Lastly, when T3 is applied, r1 will be from T3 and r2
will be from T1. This is what you mentioned will happen after conflict
resolution in the above example.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Li, Yong 2024-06-19 06:44:58 Re: Separate HEAP WAL replay logic into its own file
Previous Message David G. Johnston 2024-06-19 06:02:14 Re: Document NULL