Re: Commit Timestamp and LSN Inversion issue

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Tomas Vondra <tomas(at)vondra(dot)me>
Cc: Andres Freund <andres(at)anarazel(dot)de>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Aleksander Alekseev <aleksander(at)timescale(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Jan Wieck <jan(at)wi3ck(dot)info>
Subject: Re: Commit Timestamp and LSN Inversion issue
Date: 2024-11-12 04:21:03
Message-ID: CAA4eK1L6Vpf9EmL2bpfD3kdvM6_2XQqwJ9GY6uY-hgcot76cqA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Nov 12, 2024 at 8:35 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Mon, Nov 11, 2024 at 9:05 PM Tomas Vondra <tomas(at)vondra(dot)me> wrote:
> >
> > On 11/11/24 09:19, Amit Kapila wrote:
> > >
> > > I can't think of a solution other than the current proposal where we
> > > do both the operations (reserve WAL space for commit and adjust
> > > commit_timestamp, if required) together to resolve LSN<->Timestamp
> > > invertion issue. Please let us know if you have any other ideas for
> > > solving this. Even, if we want to change ReserveXLogInsertLocation to
> > > make the locked portion an atomic operation, we can still do it for
> > > records other than commit. Now, if we don't have any other solution
> > > for LSN<->Timestamp inversion issue, changing the entire locked
> > > portion to atomic will close the door to solving this problem forever
> > > unless we have some other way to solve it which can make it difficult
> > > to rely on commit_timestamps for certain scenarios.
> > >
> >
> > I don't know what the solution is, isn't the problem that
> >
> > (a) we record both values (LSN and timestamp) during commit
> >
> > (b) reading timestamp from system clock can be quite expensive
> >
> > It seems to me that if either of those two issues disappeared, we
> > wouldn't have such an issue.
> >
> > For example, imagine getting a timestamp is super cheap - just reading
> > and updating a simple counter from shmem, just like we do for the LSN.
> > Wouldn't that solve the problem?
> >
>
> Yeah, this is exactly what I thought.
>
> > For example, let's say XLogCtlInsert has two fields
> >
> > int64 CommitTimestamp;
> >
> > and that ReserveXLogInsertLocation() also does this for each commit:
> >
> > commit_timestamp = Insert->commit_timestamp++;
> >
> > while holding the insertpos_lock. Now we have the LSN and timestamp
> > perfectly correlated.
> >
>
> Right, and the patch sent by Hou-San [1] (based on the original patch
> by Jan) is somewhat on these lines. The idea you have shared or
> implemented by the patch is a logical clock stored in shared memory.
> So, what the patch does is that if the time recorded by the current
> commit record is lesser than or equal to the logical clock (which
> means after we record time in the commit code path and before we
> reserve the LSN, there happens a concurrent transaction), we shall
> increment the value of logical clock by one and use that as commit
> time.
>
> So, in most cases, we need to perform one additional "if check" and
> "an assignment" under spinlock, and that too only for the commit
> record. In some cases, when inversion happens, there will be "one
> increment" and "two assignments."
>

As the inversion issue can mainly hamper logical replication-based
solutions we can do any of this additional work under spinlock only
when the current record is a commit record (which the currently
proposed patch is already doing) and "wal_level = logical" and also
can have another option at the subscription level to enable this new
code path. I am not sure what is best but just sharing the ideas here.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2024-11-12 04:22:50 Re: Commit Timestamp and LSN Inversion issue
Previous Message Michael Paquier 2024-11-12 03:41:19 Re: Add parallel columns for seq scan and index scan on pg_stat_all_tables and _indexes