From: | Jan Wieck <jan(at)wi3ck(dot)info> |
---|---|
To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Aleksander Alekseev <aleksander(at)timescale(dot)com> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, tomas(at)vondra(dot)me |
Subject: | Re: Commit Timestamp and LSN Inversion issue |
Date: | 2024-11-05 13:58:36 |
Message-ID: | 9f7f4def-7ea8-4ad3-83e7-a8cc9d18c58a@wi3ck.info |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi Hackers,
On 9/5/24 01:39, Amit Kapila wrote:
> On Wed, Sep 4, 2024 at 6:35 PM Aleksander Alekseev
> <aleksander(at)timescale(dot)com> wrote:
>>
>> > > I don't think you can rely on a system clock for conflict resolution.
>> > > In a corner case a DBA can move the clock forward or backward between
>> > > recordings of Ts1 and Ts2. On top of that there is no guarantee that
>> > > 2+ servers have synchronised clocks. It seems to me that what you are
>> > > proposing will just hide the problem instead of solving it in the
>> > > general case.
>> > >
>> >
>> > It is possible that we can't rely on the system clock for conflict
>> > resolution but that is not the specific point of this thread. As
>> > mentioned in the subject of this thread, the problem is "Commit
>> > Timestamp and LSN Inversion issue". The LSN value and timestamp for a
>> > commit are not generated atomically, so two different transactions can
>> > have them in different order.
>>
>> Hm.... Then I'm having difficulties understanding why this is a
>> problem
>
> This is a potential problem pointed out during discussion of CDR [1]
> (Please read the point starting from "How is this going to deal .."
> and response by Shveta). The point of this thread is that though it
> appears to be a problem but practically there is no scenario where it
> can impact even when we implement "last_write_wins" startegy as
> explained in the initial email. If you or someone sees a problem due
> to LSN<->timestamp inversion then we need to explore the solution for
> it.
>
>>
>> and why it was necessary to mention CDR in this context in the
>> first place.
>>
>> OK, let's forget about CDR completely. Who is affected by the current
>> behavior and why would it be beneficial changing it?
>>
>
> We can't forget CDR completely as this could only be a potential
> problem in that context. Right now, we don't have any built-in
> resolution strategies, so this can't impact but if this is a problem
> then we need to have a solution for it before considering a solution
> like "last_write_wins" strategy.
I agree that we can't forget about CDR. This is precisely the problem we
ran into here at pgEdge and why we came up with a solution (attached).
> Now, instead of discussing LSN<->timestamp inversion issue, you
> started to discuss "last_write_wins" strategy itself which we have
> discussed to some extent in the thread [2]. BTW, we are planning to
> start a separate thread as well just to discuss the clock skew problem
> w.r.t resolution strategies like "last_write_wins" strategy. So, we
> can discuss clock skew in that thread and keep the focus of this
> thread LSN<->timestamp inversion problem.
Fact is that "last_write_wins" together with some implementation of
Conflict free Replicated Data Types (CRDT) is good enough for many real
world situations. Anything resembling a TPC-B or TPC-C is quite happy
with it.
The attached solution is minimally invasive because it doesn't move the
timestamp generation (clock_gettime() call) into the critical section of
ReserveXLogInsertLocation() that is protected by a spinlock. Instead it
keeps track of the last commit-ts written to WAL in shared memory and
simply bumps that by one microsecond if the next one is below or equal.
There is one extra condition in that code section plus a function call
by pointer for every WAL record. In the unlikely case of encountering a
stalled or backwards running commit-ts, the expensive part of
recalculating the CRC of the altered commit WAL-record is done later,
after releasing the spinlock. I have not been able to measure any
performance impact on a machine with 2x Xeon-Silver (32 HT cores).
This will work fine until we have systems that can sustain a commit rate
of one million transactions per second or higher for more than a few
milliseconds.
Regards, Jan
Attachment | Content-Type | Size |
---|---|---|
pg18-025-logical_commit_clock.diff | text/x-patch | 9.0 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Aleksander Alekseev | 2024-11-05 14:08:20 | Re: Remove an obsolete comment in gistinsert() |
Previous Message | torikoshia | 2024-11-05 13:30:28 | Re: Add reject_limit option to file_fdw |