From: | "Florian G(dot) Pflug" <fgp(at)phlo(dot)org> |
---|---|
To: | Postgresql-Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Maybe some more low-hanging fruit in the latestCompletedXid patch. |
Date: | 2007-09-09 23:50:05 |
Message-ID: | 46E486AD.3060002@phlo.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi
I've already posted this idea, but I feel that I did explain it
rather badly. So here comes a new try.
Currently, we do not assume that either the childXids array, nor
the xid cache in the proc array are sorted by ascending xid order.
I believe that we could simplify the code, further reduce the locking
requirements, and enabled a transaction to de-overflow it's xid cache
if we assume that those arrays are in ascending xid order.
** Sortedness **
Presumably, the existing code *already* guarantees that both the childXids
and the xid cache *are* in fact sorted by ascending xid order - due to
the following reasons:
A transaction (including subtransactions) form a tree. Each xact (or node)
may have any number of children - assume that they are added "left to right"
in the order of their creation. For ever (sub)xact, all subtrees but the
rightmost one is "closed" (meaning that no new nodes will ever be created inside
them). That last assertion follows from the fact that each non-rightmost subtree
is either aborted, or subcommitted.
Since adding a new sibling to a (sub) xact is only possible if all the other
subtrees are subcommitted or aborted, it will surely get a higher xid than any
node in any sibling subtree. Since the xids in the subcommitted siblings where
already added to the original (sub) xact upon subcommit, the childXids of that
xact will have the following structure:
<xids of 1. subcommited child> <xids of 2. subcommited child> ...
Each of the sublists are guaranteed to include only xids larger than those
inside their predecessors - which leads to a completely sorted array in
the whole.
The proc-array cache is surely sorted by ascending xid order, because we
add entries while we generate them (until we overflow).
** Benefits ***
Since we know that both the childXids and the xid cache array are sorted,
removing entries from the cache becomes much easier. Since we can only abort
a rightmost subtree (All others are either already aborted, or subcommitted),
we know that the to-be-aborted xids are the N largest xids in our whole
transactions. To remove them from the proc array we overwrite their xids
with InvalidTransactionId from right-to-left, and afterwards set the new
array length.
This allows as to replace the ExclusiveLock with a ShareLock. Since we
overwrite the xids right-to-left, a snapshot take concurrently will miss
some of the right-most xids. This is exactly the same situation as if
we had aborted those xids one-by-one, instead of all in one go.
(Xmin calculation are completely unaffected - the toplevel xid *not*
removed this way, and that one is surely smaller than any subxact xid)
The rather easy xid-cache remove algorithms will also detect if a previously
overflowed cache will be de-overflowed by the removal. (This needs a little
bit more checking - de-overflowing the xid cache is a bit trickey because
we must not race with TransactionIdIsInProgress. But that seems doable,
it'll just need the right ordering of things - maybe updating the CLOG
*before* updating the proc array is already sufficient even).
If there is interest in doing this, I can come up with a patch.
greetings, Florian Pflug
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2007-09-10 01:40:13 | Re: GucContext of log_autovacuum |
Previous Message | Florian G. Pflug | 2007-09-09 22:52:04 | Re: WIP patch for latestCompletedXid method of computing snapshot xmax |