Re: Remove HeapTuple and Buffer dependency for predicate locking functions

From: Ashwin Agrawal <aagrawal(at)pivotal(dot)io>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>, Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>
Subject: Re: Remove HeapTuple and Buffer dependency for predicate locking functions
Date: 2019-08-07 18:53:39
Message-ID: CALfoeivj_fQb72HQXV8J+iTDGvYyFK521h7mPey4OAiSgROykQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Aug 2, 2019 at 4:56 PM Ashwin Agrawal <aagrawal(at)pivotal(dot)io> wrote:

>
> On Wed, Jul 31, 2019 at 2:06 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
>
>> Looking at the code as of master, we currently have:
>>
>
> Super awesome feedback and insights, thank you!
>
> - PredicateLockTuple() calls SubTransGetTopmostTransaction() to figure
>> out a whether the tuple has been locked by the current
>> transaction. That check afaict just should be
>> TransactionIdIsCurrentTransactionId(), without all the other
>> stuff that's done today.
>>
>
> Agree. v1-0002 patch attached does that now. Please let me know if that's
> what you meant.
>
> TransactionIdIsCurrentTransactionId() imo ought to be optimized to
>> always check for the top level transactionid first - that's a good bet
>> today, but even moreso for the upcoming AMs that won't have separate
>> xids for subtransactions. Alternatively we shouldn't make that a
>> binary search for each subtrans level, but just have a small
>> simplehash hashtable for xids.
>>
>
> v1-0001 patch checks for GetTopTransactionIdIfAny() first in
> TransactionIdIsCurrentTransactionId() which seems yes better in general and
> more for future. That mostly meets the needs for current discussion.
>
> The alternative of not using binary search seems bigger refactoring and
> should be handled as separate optimization exercise outside of this thread.
>
>
>> - CheckForSerializableConflictOut() wants to get the toplevel xid for
>> the tuple, because that's the one the predicate hashtable stores.
>>
>> In your patch you've already moved the HTSV() call etc out of
>> CheckForSerializableConflictOut(). I'm somewhat inclined to think that
>> the SubTransGetTopmostTransaction() call ought to go along with that.
>> I don't really think that belongs in predicate.c, especially if
>> most/all new AMs don't use subtransaction ids.
>>
>> The only downside is that currently the
>> TransactionIdEquals(xid, GetTopTransactionIdIfAny()) check
>> avoids the SubTransGetTopmostTransaction() check.
>>
>> But again, the better fix for that seems to be to improve the generic
>> code. As written the check won't prevent a subtrans lookup for heap
>> when subtransactions are in use, and it's IME pretty common for tuples
>> to get looked at again in the transaction that has created them. So
>> I'm somewhat inclined to think that SubTransGetTopmostTransaction()
>> should have a fast-path for the current transaction - probably just
>> employing TransactionIdIsCurrentTransactionId().
>>
>
> That optimization, as Kuntal also mentioned, seems something which can be
> done on-top afterwards on current patch.
>
>
>> I don't really see what we gain by having the subtrans handling in the
>> predicate code. Especially given that we've already moved the HTSV()
>> handling out, it seems architecturally the wrong place to me - but I
>> admit that that's a fuzzy argument. The relevant mapping should be one
>> line in the caller.
>>
>
> Okay, I moved the sub transaction handling out of
> CheckForSerializableConflictOut() and have it along side HTSV() now.
>
> The reason I felt leaving subtransaction handling in generic place, was it
> might be premature to thing no future AM will need it. Plus, all
> serializable function api's having same expectations is easier. Like
> PredicateLockTuple() can be passed top or subtransaction id and it can
> handle it but with the change CheckForSerializableConflictOut() only be
> feed top transaction ID. But its fine and can see the point of AM needing
> it can easily get top transaction ID and feed it as heap.
>
>
>> I wonder if it'd be wroth to combine the
>> TransactionIdIsCurrentTransactionId() calls in the heap cases that
>> currently do both, PredicateLockTuple() and
>> HeapCheckForSerializableConflictOut(). The heap_fetch() case probably
>> isn't commonly that hot a pathq, but heap_hot_search_buffer() is.
>>
>
> Maybe, will give thought to it separate from the current patch.
>
>
>> Minor notes:
>> - I don't think 'insert_xid' is necessarily great - it could also be the
>> updating xid etc. And while you can argue that an update is an insert
>> in the current heap, that's not the case for future AMs.
>> - to me
>> @@ -1621,7 +1622,8 @@ heap_hot_search_buffer(ItemPointer tid, Relation
>> relation, Buffer buffer,
>> if (valid)
>> {
>> ItemPointerSetOffsetNumber(tid, offnum);
>> - PredicateLockTuple(relation, heapTuple,
>> snapshot);
>> + PredicateLockTID(relation,
>> &(heapTuple)->t_self, snapshot,
>> +
>> HeapTupleHeaderGetXmin(heapTuple->t_data));
>> if (all_dead)
>> *all_dead = false;
>> return true;
>>
>> What are those parens - as placed they can't do anything. Did you
>> intend to write &(heapTuple->t_self)? Even that is pretty superfluous,
>> but it at least clarifies the precedence.
>>
>
> Fixed. No idea what I was thinking there, mostly feel I intended to have
> it as like &(heapTuple->t_self).
>
> I'm also a bit confused why we don't need to pass in the offset of the
>> current tuple, rather than the HOT root tuple here. That's not related
>> to this patch. But aren't we locking the wrong tuple here, in case of
>> HOT?
>>
>
> Yes, root is being locked here instead of the HOT. But I don't have full
> context on the same. If we wish to fix it though, can be easily done now
> with the patch by passing "tid" instead of &(heapTuple->t_self).
>
> - I wonder if CheckForSerializableConflictOutNeeded() shouldn't have a
>> portion of it's code as a static inline. In particular, it's a shame
>> that we currently perform external function calls at quite the
>> frequency when serializable isn't even in use.
>>
>
> I am not sure on portion of the code part? SerializationNeededForRead() is
> static inline function in C file. Can't inline
> CheckForSerializableConflictOutNeeded() without moving
> SerializationNeededForRead() and some other variables to header file.
> CheckForSerializableConflictOut() wasn't inline either, so a function call
> was performed earlier as well when serializable isn't even in use.
>
> I understand that with refactor, HeapCheckForSerializableConflictOut() is
> called which calls CheckForSerializableConflictOutNeeded(). If that's the
> problem, for addressing the same, I had proposed alternative way to
> refactor. CheckForSerializableConflictOut() can take callback function and
> void* callback argument for AM specific check instead. So, the flow would
> be AM calling CheckForSerializableConflictOut() as today and only if
> serializable in use will invoke the callback to check with AM if more work
> should be performed or not. Essentially
> HeapCheckForSerializableConflictOut() will become callback function
> instead. Due to void* callback argument aspect I didn't like that solution
> and felt AM performing checks and calling CheckForSerializableConflictOut()
> seems more straight forward.
>

Attaching re-based version of the patches on top of current master, which
has the fix for HOT serializable predicate locking bug spotted by Andres
committed now.

Attachment Content-Type Size
v2-0001-Optimize-TransactionIdIsCurrentTransactionId.patch text/x-patch 1.1 KB
v2-0002-Optimize-PredicateLockTuple.patch text/x-patch 1.7 KB
v2-0003-Remove-HeapTuple-dependency-for-predicate-locking.patch text/x-patch 25.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2019-08-07 19:29:28 Re: crash 11.5~ (and 11.4)
Previous Message Simon Riggs 2019-08-07 18:28:17 Re: Problem with default partition pruning