Re: Low hanging fruit in lazy-XID-assignment patch?

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Florian G(dot) Pflug" <fgp(at)phlo(dot)org>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Low hanging fruit in lazy-XID-assignment patch?
Date: 2007-09-07 22:15:41
Message-ID: 23762.1189203341@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Here's some revised text for the README file, based on using Florian's
idea of a global latestCompletedXid variable. As I worked through it
I realized that in this design, XidGenLock gates entry of new XIDs into
the ProcArray while ProcArrayLock gates their removal. Which is an
interesting sort of symmetry property. It also turns out that the
reason we need to gate entry with XidGenLock is to keep from breaking
GetOldestXmin, rather than to ensure correctness of snapshots per se.

(Note: I refer in the text to ProcArrayEndTransaction(), which is a
function I'm thinking of putting into procarray.c to replace the current
inline-in-xact.c code that clears xid and related fields.)

Comments?

regards, tom lane

Interlocking transaction begin, transaction end, and snapshots
--------------------------------------------------------------

We try hard to minimize the amount of overhead and lock contention involved
in the frequent activities of beginning/ending a transaction and taking a
snapshot. Unfortunately, we must have some interlocking for this, because
we must ensure consistency about the commit order of transactions.
For example, suppose an UPDATE in xact A is blocked by xact B's prior
update of the same row, and xact B is doing commit while xact C gets a
snapshot. Xact A can complete and commit as soon as B releases its locks.
If xact C's GetSnapshotData sees xact B as still running, then it had
better see xact A as still running as well, or it will be able to see two
tuple versions - one deleted by xact B and one inserted by xact A. Another
reason why this would be bad is that C would see (in the row inserted by A)
earlier changes by B, and it would be inconsistent for C not to see any
of B's changes elsewhere in the database.

Formally, the correctness requirement is "if a snapshot A considers
transaction X as committed, and any of transaction X's snapshots considered
transaction Y as committed, then snapshot A must consider transaction Y as
committed".

What we actually enforce is strict serialization of commits and rollbacks
with snapshot-taking: we do not allow any transaction to exit the set of
running transactions while a snapshot is being taken. (This rule is
stronger than necessary for consistency, but is relatively simple to
enforce, and it assists with some other issues as explained below.) The
implementation of this is that GetSnapshotData takes the ProcArrayLock in
shared mode (so that multiple backends can take snapshots in parallel),
but ProcArrayEndTransaction must take the ProcArrayLock in exclusive mode
while clearing MyProc->xid at transaction end (either commit or abort).

ProcArrayEndTransaction also holds the lock while advancing the shared
latestCompletedXid variable. This allows GetSnapshotData to use
latestCompletedXid + 1 as xmax for its snapshot: there can be no
transaction >= this xid value that the snapshot needs to consider as
completed.

In short, then, the rule is that no transaction may exit the set of
currently-running transactions between the time we fetch latestCompletedXid
and the time we finish building our snapshot. However, this restriction
only applies to transactions that have an XID --- read-only transactions
can end without acquiring ProcArrayLock, since they don't affect anyone
else's snapshot nor latestCompletedXid.

Transaction start, per se, doesn't have any interlocking with these
considerations, since we no longer assign an XID immediately at transaction
start. But when we do decide to allocate an XID, GetNewTransactionId must
store the new XID into the shared ProcArray before releasing XidGenLock.
This ensures that all top-level XIDs <= latestCompletedXid are either
present in the ProcArray, or not running anymore. (This guarantee doesn't
apply to subtransaction XIDs, because of the possibility that there's not
room for them in the subxid array; instead we guarantee that they are
present or the overflow flag is set.) If a backend released XidGenLock
before storing its XID into MyProc, then it would be possible for another
backend to allocate and commit a later XID, causing latestCompletedXid to
pass the first backend's XID, before that value became visible in the
ProcArray. That would break GetOldestXmin, as discussed below.

We allow GetNewTransactionId to store the XID into MyProc->xid (or the
subxid array) without taking ProcArrayLock. This was once necessary to
avoid deadlock; while that is no longer the case, it's still beneficial for
performance. We are thereby relying on fetch/store of an XID to be atomic,
else other backends might see a partially-set XID. This also means that
readers of the ProcArray xid fields must be careful to fetch a value only
once, rather than assume they can read it multiple times and get the same
answer each time.

Another important activity that uses the shared ProcArray is GetOldestXmin,
which must determine a lower bound for the oldest xmin of any active MVCC
snapshot, system-wide. Each individual backend advertises the smallest
xmin of its own snapshots in MyProc->xmin, or zero if it currently has no
live snapshots (eg, if it's between transactions or hasn't yet set a
snapshot for a new transaction). GetOldestXmin takes the MIN() of the
valid xmin fields. It does this with only shared lock on ProcArrayLock,
which means there is a potential race condition against other backends
doing GetSnapshotData concurrently: we must be certain that a concurrent
backend that is about to set its xmin does not compute an xmin less than
what GetOldestXmin returns. We ensure that by including all the active
XIDs into the MIN() calculation, along with the valid xmins. The rule that
transactions can't exit without taking exclusive ProcArrayLock ensures that
concurrent holders of shared ProcArrayLock will compute the same minimum of
currently-active XIDs: no xact, in particular not the oldest, can exit
while we hold shared ProcArrayLock. So GetOldestXmin's view of the minimum
active XID will be the same as that of any concurrent GetSnapshotData, and
so it can't produce an overestimate. If there is no active transaction at
all, GetOldestXmin returns latestCompletedXid + 1, which is a lower bound
for the xmin that might be computed by concurrent or later GetSnapshotData
calls. (We know that no XID less than this could be about to appear in
the ProcArray, because of the XidGenLock interlock discussed above.)

GetSnapshotData also performs an oldest-xmin calculation (which had better
match GetOldestXmin's) and stores that into RecentGlobalXmin, which is used
for some tuple age cutoff checks where a fresh call of GetOldestXmin seems
too expensive. Note that while it is certain that two concurrent
executions of GetSnapshotData will compute the same xmin for their own
snapshots, as argued above, it is not certain that they will arrive at the
same estimate of RecentGlobalXmin. This is because we allow XID-less
transactions to clear their MyProc->xmin asynchronously (without taking
ProcArrayLock), so one execution might see what had been the oldest xmin,
and another not. This is OK since RecentGlobalXmin need only be a valid
lower bound. As noted above, we are already assuming that fetch/store
of the xid fields is atomic, so assuming it for xmin as well is no extra
risk.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Avery Payne 2007-09-07 22:38:58 Re: A Silly Idea for Vertically-Oriented Databases
Previous Message Hannu Krosing 2007-09-07 22:02:04 Re: [FEATURE REQUEST] Streaming Onlinebackup (Maybe OFFTOPIC)