Re: Commit Timestamp and LSN Inversion issue

From: Tomas Vondra <tomas(at)vondra(dot)me>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>
Cc: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Aleksander Alekseev <aleksander(at)timescale(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Jan Wieck <jan(at)wi3ck(dot)info>
Subject: Re: Commit Timestamp and LSN Inversion issue
Date: 2024-11-11 15:35:14
Message-ID: 11ee3489-40ba-45d6-ba6a-7c3def725e84@vondra.me
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 11/11/24 09:19, Amit Kapila wrote:
> On Fri, Nov 8, 2024 at 8:23 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
>>
>> On 2024-11-08 09:08:55 +0000, Zhijie Hou (Fujitsu) wrote:
>>>>
>>>> I think it's *completely* unacceptable to call a hook inside
>>>> ReserveXLogInsertLocation, with the spinlock held, no less. That's the most
>>>> contended section of code in postgres.
>>>
>>> I understand your concern and appreciate the feedback. I've made some
>>> adjustments to the patch by directly placing the code to adjust the commit
>>> timestamp within the spinlock, aiming to keep it as efficient as possible. The
>>> changes have resulted in just a few extra lines. Would this approach be
>>> acceptable to you ?
>>
>> No, not at all. I think it's basically not acceptable to add any nontrivial
>> instruction to the locked portion of ReserveXLogInsertLocation().
>>
>> Before long we're going to have to replace that spinlocked instruction with
>> atomics, which will make it impossible to just block while holding the lock
>> anyway.
>>
>> IMO nothing that touches core xlog insert infrastructure is acceptable for
>> this.
>>
>
> I can't think of a solution other than the current proposal where we
> do both the operations (reserve WAL space for commit and adjust
> commit_timestamp, if required) together to resolve LSN<->Timestamp
> invertion issue. Please let us know if you have any other ideas for
> solving this. Even, if we want to change ReserveXLogInsertLocation to
> make the locked portion an atomic operation, we can still do it for
> records other than commit. Now, if we don't have any other solution
> for LSN<->Timestamp inversion issue, changing the entire locked
> portion to atomic will close the door to solving this problem forever
> unless we have some other way to solve it which can make it difficult
> to rely on commit_timestamps for certain scenarios.
>

I don't know what the solution is, isn't the problem that

(a) we record both values (LSN and timestamp) during commit

(b) reading timestamp from system clock can be quite expensive

It seems to me that if either of those two issues disappeared, we
wouldn't have such an issue.

For example, imagine getting a timestamp is super cheap - just reading
and updating a simple counter from shmem, just like we do for the LSN.
Wouldn't that solve the problem?

For example, let's say XLogCtlInsert has two fields

int64 CommitTimestamp;

and that ReserveXLogInsertLocation() also does this for each commit:

commit_timestamp = Insert->commit_timestamp++;

while holding the insertpos_lock. Now we have the LSN and timestamp
perfectly correlated. Of course, this timestamp would be completely
detached from "normal" timestamps, it'd just be a sequence. But we could
also once in a while "set" the timestamp to GetCurrentTimestamp(), or
something like that. For example, there could be a "ticker" process,
doing that every 1000ms, or 100ms, or whatever we think is enough.

Or maybe it could be driven by the commit itself, say when some timeout
expires, we assign too many items since the last commit, ...

Alternatively, we could simply stop relying on the timestamps recorded
in the commit, and instead derive "monotonic" commit timestamps after
the fact. For example, we could track timestamps for some subset of
commits, and then approximate the rest using LSN.

AFAIK the inversion can happen only for concurrent commits, and there's
can't be that many of those. So if we record the (LSN, timestamp) for
every 1MB of WAL, we approximate timestamps for commits in that 1MB by
linear approximation.

Of course, there's a lot of questions and details to solve - e.g. how
often would it need to happen, when exactly would it happen, etc. And
also how would that integrate with the logical decoding - it's easy to
just get the timestamp from the WAL record, this would require more work
to actually calculate it. It's only a very rough idea.

regards

--
Tomas Vondra

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message dinesh salve 2024-11-11 15:42:20 explain plans for foreign servers
Previous Message Tom Lane 2024-11-11 15:35:04 Re: PG17 failing tests (DST related?)