From: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
---|---|
To: | Tomas Vondra <tomas(at)vondra(dot)me> |
Cc: | Andres Freund <andres(at)anarazel(dot)de>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Aleksander Alekseev <aleksander(at)timescale(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Jan Wieck <jan(at)wi3ck(dot)info> |
Subject: | Re: Commit Timestamp and LSN Inversion issue |
Date: | 2024-11-12 04:21:03 |
Message-ID: | CAA4eK1L6Vpf9EmL2bpfD3kdvM6_2XQqwJ9GY6uY-hgcot76cqA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Nov 12, 2024 at 8:35 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Mon, Nov 11, 2024 at 9:05 PM Tomas Vondra <tomas(at)vondra(dot)me> wrote:
> >
> > On 11/11/24 09:19, Amit Kapila wrote:
> > >
> > > I can't think of a solution other than the current proposal where we
> > > do both the operations (reserve WAL space for commit and adjust
> > > commit_timestamp, if required) together to resolve LSN<->Timestamp
> > > invertion issue. Please let us know if you have any other ideas for
> > > solving this. Even, if we want to change ReserveXLogInsertLocation to
> > > make the locked portion an atomic operation, we can still do it for
> > > records other than commit. Now, if we don't have any other solution
> > > for LSN<->Timestamp inversion issue, changing the entire locked
> > > portion to atomic will close the door to solving this problem forever
> > > unless we have some other way to solve it which can make it difficult
> > > to rely on commit_timestamps for certain scenarios.
> > >
> >
> > I don't know what the solution is, isn't the problem that
> >
> > (a) we record both values (LSN and timestamp) during commit
> >
> > (b) reading timestamp from system clock can be quite expensive
> >
> > It seems to me that if either of those two issues disappeared, we
> > wouldn't have such an issue.
> >
> > For example, imagine getting a timestamp is super cheap - just reading
> > and updating a simple counter from shmem, just like we do for the LSN.
> > Wouldn't that solve the problem?
> >
>
> Yeah, this is exactly what I thought.
>
> > For example, let's say XLogCtlInsert has two fields
> >
> > int64 CommitTimestamp;
> >
> > and that ReserveXLogInsertLocation() also does this for each commit:
> >
> > commit_timestamp = Insert->commit_timestamp++;
> >
> > while holding the insertpos_lock. Now we have the LSN and timestamp
> > perfectly correlated.
> >
>
> Right, and the patch sent by Hou-San [1] (based on the original patch
> by Jan) is somewhat on these lines. The idea you have shared or
> implemented by the patch is a logical clock stored in shared memory.
> So, what the patch does is that if the time recorded by the current
> commit record is lesser than or equal to the logical clock (which
> means after we record time in the commit code path and before we
> reserve the LSN, there happens a concurrent transaction), we shall
> increment the value of logical clock by one and use that as commit
> time.
>
> So, in most cases, we need to perform one additional "if check" and
> "an assignment" under spinlock, and that too only for the commit
> record. In some cases, when inversion happens, there will be "one
> increment" and "two assignments."
>
As the inversion issue can mainly hamper logical replication-based
solutions we can do any of this additional work under spinlock only
when the current record is a commit record (which the currently
proposed patch is already doing) and "wal_level = logical" and also
can have another option at the subscription level to enable this new
code path. I am not sure what is best but just sharing the ideas here.
--
With Regards,
Amit Kapila.
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Kapila | 2024-11-12 04:22:50 | Re: Commit Timestamp and LSN Inversion issue |
Previous Message | Michael Paquier | 2024-11-12 03:41:19 | Re: Add parallel columns for seq scan and index scan on pg_stat_all_tables and _indexes |