From: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
---|---|
To: | Tomas Vondra <tomas(at)vondra(dot)me> |
Cc: | Andres Freund <andres(at)anarazel(dot)de>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Aleksander Alekseev <aleksander(at)timescale(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Jan Wieck <jan(at)wi3ck(dot)info> |
Subject: | Re: Commit Timestamp and LSN Inversion issue |
Date: | 2024-11-13 08:56:37 |
Message-ID: | CAA4eK1LZf5G2_HhqdHFC5Y67i2xmL=nV33FX-f5svXLMCQ4Omw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Nov 12, 2024 at 5:30 PM Tomas Vondra <tomas(at)vondra(dot)me> wrote:
>
> On 11/12/24 04:05, Amit Kapila wrote:
> >
> > Right, and the patch sent by Hou-San [1] (based on the original patch
> > by Jan) is somewhat on these lines. The idea you have shared or
> > implemented by the patch is a logical clock stored in shared memory.
> > So, what the patch does is that if the time recorded by the current
> > commit record is lesser than or equal to the logical clock (which
> > means after we record time in the commit code path and before we
> > reserve the LSN, there happens a concurrent transaction), we shall
> > increment the value of logical clock by one and use that as commit
> > time.
> >
> > So, in most cases, we need to perform one additional "if check" and
> > "an assignment" under spinlock, and that too only for the commit
> > record. In some cases, when inversion happens, there will be "one
> > increment" and "two assignments."
> >
>
> In a way, yes. Except that each backend first sets the commit timestamp
> the usual way too. And then it also has to recalculate the checksum of
> the record - which happens under lwlock and not the main spinlock, but
> it still seems it might be quite expensive.
>
> I wonder how often we'd need to do this - it seems to me that for high
> throughput systems we might easily get into a situation where we have to
> "correct" the timestamp and recalculate the checksum (so calculating it
> twice) for most of the commits.
>
> Would be good to see some numbers from experiments - how often this can
> happen, what's the impact on throughput etc.
>
The initial numbers shared by Kuroda-San [1] didn't show any
significant impact due to this. It will likely happen in some cases
under stress but shouldn't be often because normally the backend that
acquired the timestamp first should reserve the LSN first as well. We
can do more tests as well to see the impact but the key point Andres
is raising is that we won't be able to convert the operation in
ReserveXLogInsertLocation() to use atomics after such a solution. Now,
even, if the change is only in the *commit* code path, we may or may
not be able to maintain two code paths for reserving LSN, one using
atomics and the other (for commit record) using spinlock.
I think even if we consider your solution to directly increment the
commit_timestamp in spinlock to make it monotonic and then later
rectify the commit_ts, will it be any more helpful in making the
entire operation atomics?
> Ofc, you may argue that this would only affect people with the hook, and
> those people may be OK with the impact. And in a way I agree. But I also
> understand the concerns expressed by Andres, that adding a hook here may
> be problematic / easy to get wrong, and then we'd be the ones holding
> the brown bag ...
>
BTW, the latest patch [2] doesn't have any hook.
> >> Of course, this timestamp would be completely
> >> detached from "normal" timestamps, it'd just be a sequence. But we could
> >> also once in a while "set" the timestamp to GetCurrentTimestamp(), or
> >> something like that. For example, there could be a "ticker" process,
> >> doing that every 1000ms, or 100ms, or whatever we think is enough.
> >>
> >> Or maybe it could be driven by the commit itself, say when some timeout
> >> expires, we assign too many items since the last commit, ...
> >>
> >
> > If we follow the patch, we don't need this additional stuff. Isn't
> > what is proposed [1] quite similar to your idea? If so, we can save
> > this extra maintenance of commit timestamps.
> >
>
> Not sure. But I think we already have some of the necessary parts in
> commit_ts. It'd need some improvements, but it might also be a good
> place to adjust the timestamps. The inversion can only happens within a
> small group of recent transactions (due to NUM_XLOGINSERT_LOCKS), so it
> shouldn't be hard to keep a small buffer for those XIDs. Ofc, subxacts
> would complicate stuff, and it'd mean the timestamps written to WAL and
> to commit_ts would differ. Not great.
>
Won't it be difficult to predict the number of transactions that could
be impacted in this model?
In general, I understand the concern here related to the difficulty in
converting the operation in ReserveXLogInsertLocation() to atomics but
leaving the LSN<->Timestamp inversion issue open forever also doesn't
give any good feeling. I feel any solution to fixup commit_timestamps
afterward as we discussed above [3] would be much more complex and
difficult to maintain.
[1] - https://www.postgresql.org/message-id/TYAPR01MB569222C1312E7A6C3C63539DF5582%40TYAPR01MB5692.jpnprd01.prod.outlook.com
[2] - https://www.postgresql.org/message-id/OS0PR01MB57162A227EC357482FEB4470945D2%40OS0PR01MB5716.jpnprd01.prod.outlook.com
[3] - https://www.postgresql.org/message-id/CAA4eK1%2BOZLh7vA1CQkoq0ba4J_P-8JFHnt0a_YC2xfB0t3%2BakA%40mail.gmail.com
--
With Regards,
Amit Kapila.
From | Date | Subject | |
---|---|---|---|
Next Message | vignesh C | 2024-11-13 09:29:25 | Re: Introduce XID age and inactive timeout based replication slot invalidation |
Previous Message | Peter Eisentraut | 2024-11-13 08:53:36 | Re: doc: pgevent.dll location |