RE: Commit Timestamp and LSN Inversion issue

From: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
To: Jan Wieck <jan(at)wi3ck(dot)info>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Aleksander Alekseev <aleksander(at)timescale(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, "tomas(at)vondra(dot)me" <tomas(at)vondra(dot)me>
Subject: RE: Commit Timestamp and LSN Inversion issue
Date: 2024-11-07 02:30:22
Message-ID: OS0PR01MB57162A0406EFBB133C9F747E945C2@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tuesday, November 5, 2024 9:59 PM Jan Wieck <jan(at)wi3ck(dot)info> wrote:
>
> Hi Hackers,
>
> On 9/5/24 01:39, Amit Kapila wrote:
> >
> > We can't forget CDR completely as this could only be a potential
> > problem in that context. Right now, we don't have any built-in
> > resolution strategies, so this can't impact but if this is a problem
> > then we need to have a solution for it before considering a solution
> > like "last_write_wins" strategy.
>
> I agree that we can't forget about CDR. This is precisely the problem we ran into
> here at pgEdge and why we came up with a solution (attached).
>
> > Now, instead of discussing LSN<->timestamp inversion issue, you
> > started to discuss "last_write_wins" strategy itself which we have
> > discussed to some extent in the thread [2]. BTW, we are planning to
> > start a separate thread as well just to discuss the clock skew problem
> > w.r.t resolution strategies like "last_write_wins" strategy. So, we
> > can discuss clock skew in that thread and keep the focus of this
> > thread LSN<->timestamp inversion problem.
>
> Fact is that "last_write_wins" together with some implementation of Conflict
> free Replicated Data Types (CRDT) is good enough for many real world
> situations. Anything resembling a TPC-B or TPC-C is quite happy with it.
>
> The attached solution is minimally invasive because it doesn't move the
> timestamp generation (clock_gettime() call) into the critical section of
> ReserveXLogInsertLocation() that is protected by a spinlock. Instead it keeps
> track of the last commit-ts written to WAL in shared memory and simply bumps
> that by one microsecond if the next one is below or equal.
> There is one extra condition in that code section plus a function call by pointer
> for every WAL record. In the unlikely case of encountering a stalled or
> backwards running commit-ts, the expensive part of recalculating the CRC of
> the altered commit WAL-record is done later, after releasing the spinlock. I have
> not been able to measure any performance impact on a machine with 2x
> Xeon-Silver (32 HT cores).
>
> This will work fine until we have systems that can sustain a commit rate of one
> million transactions per second or higher for more than a few milliseconds.

Thanks for the patch! I am reading the patch and noticed few minor things.

1.
+ /*
+ * This is a local transaction. Make sure that the xact_time
+ * higher than any timestamp we have seen thus far.
+ *
+ * TODO: This is not postmaster restart safe. If the local
+ * system clock is further behind other nodes than it takes
+ * for the postmaster to restart (time between it stops
+ * accepting new transactions and time when it becomes ready
+ * to accept new transactions), local transactions will not
+ * be bumped into the future correctly.
+ */

The TODO section mentions other nodes, but I believe think patch currently do
not have the handling of timestamps for other nodes. Should we either remove
this part or add a brief explanation to clarify the relationship with other
nodes?

2.
+/*
+ * Hook function to be called while holding the WAL insert spinlock.
+ * to adjust commit timestamps via Lamport clock if needed.
+ */

The second line seems can be improved:
"to adjust commit timestamps .." => "It adjusts commit timestamps ..."

Best Regards,
Hou zj

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tender Wang 2024-11-07 02:57:08 Re: Remove an obsolete comment in gistinsert()
Previous Message Nathan Bossart 2024-11-07 02:26:47 Re: Popcount optimization using AVX512