From: | Ranier Vilela <ranier(dot)vf(at)gmail(dot)com> |
---|---|
To: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
Cc: | Alexander Korotkov <aekorotkov(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: table_tuple_lock's snapshot argument |
Date: | 2025-03-10 16:20:54 |
Message-ID: | CAEudQAr-0oLQ3WdmEdN854EE2BhPzS_ftKbHJ1hw2trdK=oQFA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Em seg., 10 de mar. de 2025 às 12:22, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
escreveu:
> On 21/01/2025 12:05, Alexander Korotkov wrote:
> > On Tue, Jan 21, 2025 at 12:03 PM Alexander Korotkov
> > <aekorotkov(at)gmail(dot)com> wrote:
> >> On Sun, Dec 29, 2024 at 3:59 PM Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
> wrote:
> >>> However, I think GetLatestSnapshot() is wrong here anyway. None of this
> >>> matters for heapam which just ignores the 'snapshot' argument, but
> let's
> >>> assume a different AM that would use the snapshot to identify the tuple
> >>> version. The TID was fetched from an index using an index scan with
> >>> SnapshotDirty. There's no guarantee that the LatestSnapshot would match
> >>> the same tuple version that the index scan found. If an MVCC snapshot
> is
> >>> needed, surely it should be acquired before the index scan, and used
> for
> >>> the index scan as well.
> >>>
> >>> I see three options:
> >>>
> >>> 1. Remove the 'snapshot' argument, since it's not used by heapam. If we
> >>> get a table AM where a single TID represents multiple row versions,
> this
> >>> will need to be revisited.
> >>>
> >>> 2. Rewrite the recheck code in execReplication.c so that it uses the
> >>> snapshot in a more consistent fashion. Call GetLatestSnapshot() first,
> >>> and use the same snapshot in the index scan and table_tuple_lock().
> >>> Acquiring a snapshot isn't free though, so it would be nice to avoid
> >>> doing that when the heapam is just going to ignore it anyway. If we go
> >>> with this option, I think we could reuse the snapshot that is already
> >>> active in most cases, and only take a new snapshot if the tuple was
> >>> concurrently updated.
> >>>
> >>> 3. Modify the tableam interface so that the index scan can return a
> more
> >>> unique identifier of the tuple version. In heapam, it could be the TID
> >>> like today, but a different AM could return some other token. Continue
> >>> to use SnapshotDirty in the index scan, but in the call to
> >>> table_tuple_lock(), instead of passing GetLatestSnapshot() and TID,
> pass
> >>> the token you got index_getnext_slot().
> >>
> >> Thank you for your finding and analysis. I would vote for #3.
> >> Currently snapshot argument is not properly used, and the heapam code
> >> doesn't check it. I think allowing flexible tuple identifier is
> >> essential for future of table AM API. Therefore, any
> >> snapshot/visibility information could be put inside that tuple
> >> identifier if needed.
> >
> > To be more specific, I vote for #1 + #3. Remove the snapshot argument
> > for now, then work on flexible tuple identifier.
>
> I committed and backpatched the trivial fix for not, just replacing
> GetLatestSnapshot() with GetActiveSnapshot(). I got cold feet on just
> removing the argument without providing a replacement, because it's not
> 100% clear to me if the current situation is broken or not.
>
> To summarize the issue, RelationFindReplTupleByIndex() performs an index
> scan with DirtySnapshot, and then calls table_tuple_lock() with a fresh
> MVCC snapshot to lock the row. There's no guarantee that the index scan
> found the same row version that table_tuple_lock() will lock, if the TID
> alone doesn't uniquely identify it. But that's still OK as long as the
> key column hasn't changed, like with heapam's HOT updates. I couldn't
> convince myself that it's broken nor that it's guaranteed to be correct
> with other AMs.
>
Just for curiosity.
*RelationFindReplTupleSeq* on the same source, does not suffer
from the same issue?
PushActiveSnapshot(GetLatestSnapshot());
res = table_tuple_lock(rel, &(outslot->tts_tid), GetLatestSnapshot(),
best regards,
Ranier Vilela
From | Date | Subject | |
---|---|---|---|
Next Message | Heikki Linnakangas | 2025-03-10 16:52:50 | Re: table_tuple_lock's snapshot argument |
Previous Message | Tom Lane | 2025-03-10 16:18:29 | Re: Printing window function OVER clauses in EXPLAIN |