Re: Commit Timestamp and LSN Inversion issue

From: Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>
To: shveta malik <shveta(dot)malik(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, tomas(at)vondra(dot)me
Subject: Re: Commit Timestamp and LSN Inversion issue
Date: 2024-09-09 06:11:13
Message-ID: CABdArM71Q6GYMBs19c_dVrwLub3CsaFNmR9Orf5c-agG6W1xFA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Sep 4, 2024 at 12:23 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
>
> Hello hackers,
> (Cc people involved in the earlier discussion)
>
> I would like to discuss the $Subject.
>
> While discussing Logical Replication's Conflict Detection and
> Resolution (CDR) design in [1] , it came to our notice that the
> commit LSN and timestamp may not correlate perfectly i.e. commits may
> happen with LSN1 < LSN2 but with Ts1 > Ts2. This issue may arise
> because, during the commit process, the timestamp (xactStopTimestamp)
> is captured slightly earlier than when space is reserved in the WAL.
>
> ~~
>
> Reproducibility of conflict-resolution problem due to the timestamp inversion
> ------------------------------------------------
> It was suggested that timestamp inversion *may* impact the time-based
> resolutions such as last_update_wins (targeted to be implemented in
> [1]) as we may end up making wrong decisions if timestamps and LSNs
> are not correctly ordered. And thus we tried some tests but failed to
> find any practical scenario where it could be a problem.
>
> Basically, the proposed conflict resolution is a row-level resolution,
> and to cause the row value to be inconsistent, we need to modify the
> same row in concurrent transactions and commit the changes
> concurrently. But this doesn't seem possible because concurrent
> updates on the same row are disallowed (e.g., the later update will be
> blocked due to the row lock). See [2] for the details.
>
> We tried to give some thoughts on multi table cases as well e.g.,
> update table A with foreign key and update the table B that table A
> refers to. But update on table A will block the update on table B as
> well, so we could not reproduce data-divergence due to the
> LSN/timestamp mismatch issue there.
>
> ~~
>
> Idea proposed to fix the timestamp inversion issue
> ------------------------------------------------
> There was a suggestion in [3] to acquire the timestamp while reserving
> the space (because that happens in LSN order). The clock would need to
> be monotonic (easy enough with CLOCK_MONOTONIC), but also cheap. The
> main problem why it's being done outside the critical section, because
> gettimeofday() may be quite expensive. There's a concept of hybrid
> clock, combining "time" and logical counter, which might be useful
> independently of CDR.
>
> On further analyzing this idea, we found that CLOCK_MONOTONIC can be
> accepted only by clock_gettime() which has more precision than
> gettimeofday() and thus is equally or more expensive theoretically (we
> plan to test it and post the results). It does not look like a good
> idea to call any of these when holding spinlock to reserve the wal
> position. As for the suggested solution "hybrid clock", it might not
> help here because the logical counter is only used to order the
> transactions with the same timestamp. The problem here is how to get
> the timestamp along with wal position
> reservation(ReserveXLogInsertLocation).
>

Here are the tests done to compare clock_gettime() and gettimeofday()
performance.

Machine details :
Intel(R) Xeon(R) CPU E7-4890 v2 @ 2.80GHz
CPU(s): 120; 800GB RAM

Three functions were tested across three different call volumes (1
million, 100 million, and 1 billion):
1) clock_gettime() with CLOCK_REALTIME
2) clock_gettime() with CLOCK_MONOTONIC
3) gettimeofday()

--> clock_gettime() with CLOCK_MONOTONIC sometimes shows slightly
better performance, but not consistently. The difference in time taken
by all three functions is minimal, with averages varying by no more
than ~2.5%. Overall, the performance between CLOCK_MONOTONIC and
gettimeofday() is essentially the same.

Below are the test results -
(each test was run twice for consistency)

1) For 1 million calls:
1a) clock_gettime() with CLOCK_REALTIME:
- Run 1: 0.01770 seconds, Run 2: 0.01772 seconds, Average: 0.01771 seconds.
1b) clock_gettime() with CLOCK_MONOTONIC:
- Run 1: 0.01753 seconds, Run 2: 0.01748 seconds, Average: 0.01750 seconds.
1c) gettimeofday():
- Run 1: 0.01742 seconds, Run 2: 0.01777 seconds, Average: 0.01760 seconds.

2) For 100 million calls:
2a) clock_gettime() with CLOCK_REALTIME:
- Run 1: 1.76649 seconds, Run 2: 1.76602 seconds, Average: 1.76625 seconds.
2b) clock_gettime() with CLOCK_MONOTONIC:
- Run 1: 1.72768 seconds, Run 2: 1.72988 seconds, Average: 1.72878 seconds.
2c) gettimeofday():
- Run 1: 1.72436 seconds, Run 2: 1.72174 seconds, Average: 1.72305 seconds.

3) For 1 billion calls:
3a) clock_gettime() with CLOCK_REALTIME:
- Run 1: 17.63859 seconds, Run 2: 17.65529 seconds, Average:
17.64694 seconds.
3b) clock_gettime() with CLOCK_MONOTONIC:
- Run 1: 17.15109 seconds, Run 2: 17.27406 seconds, Average:
17.21257 seconds.
3c) gettimeofday():
- Run 1: 17.21368 seconds, Run 2: 17.22983 seconds, Average:
17.22175 seconds.
~~~~
Attached the scripts used for tests.

--
Thanks,
Nisha

Attachment Content-Type Size
clock_gettime_test.zip application/x-zip-compressed 2.0 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Dilip Kumar 2024-09-09 06:14:42 Re: Invalid Assert while validating REPLICA IDENTITY?
Previous Message Peter Eisentraut 2024-09-09 06:06:40 Re: Virtual generated columns