Re: Conflict Detection and Resolution

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: shveta malik <shveta(dot)malik(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>
Subject: Re: Conflict Detection and Resolution
Date: 2024-07-03 05:59:25
Message-ID: CAFiTN-sf23K=sRsnxw-BKNJqg5P6JXcqXBBkx=EULX8QGSQYaw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jul 3, 2024 at 11:00 AM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
>
> > Yes, I also think it should be independent of CDR. IMHO, it should be
> > based on the user-configured maximum clock skew tolerance and can be
> > independent of CDR.
>
> +1
>
> > IIUC we would make the remote apply wait just
> > before committing if the remote commit timestamp is ahead of the local
> > clock by more than the maximum clock skew tolerance, is that correct?
>
> +1 on condition to wait.
>
> But I think we should make apply worker wait during begin
> (apply_handle_begin) instead of commit. It makes more sense to delay
> the entire operation to manage clock-skew rather than the commit
> alone. And only then CDR's timestamp based resolution which are much
> prior to commit-stage can benefit from this. Thoughts?

But do we really need to wait at apply_handle_begin()? I mean if we
already know the commit_ts then we can perform the conflict resolution
no? I mean we should wait before committing because we are
considering this remote transaction to be in the future and we do not
want to confirm the commit of this transaction to the remote node
before the local clock reaches the record commit_ts to preserve the
causal order. However, we can still perform conflict resolution
beforehand since we already know the commit_ts. The conflict
resolution function will be something like "out_version =
CRF(version1_commit_ts, version2_commit_ts)," so the result should be
the same regardless of when we apply it, correct? From a performance
standpoint, wouldn't it be beneficial to perform as much work as
possible in advance? By the time we apply all the operations, the
local clock might already be in sync with the commit_ts of the remote
transaction. Am I missing something?

However, while thinking about this, I'm wondering about how we will
handle the streaming of in-progress transactions. If we start applying
with parallel workers, we might not know the commit_ts of those
transactions since they may not have been committed yet. One simple
option could be to prevent parallel workers from applying in-progress
transactions when CDR is set up. Instead, we could let these
transactions spill to files and only apply them once we receive the
commit record.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2024-07-03 06:05:50 Re: Incorrect Assert in BufFileSize()?
Previous Message Amit Kapila 2024-07-03 05:57:50 Re: speed up a logical replica setup