From: | Manfred Koizar <mkoi-pg(at)aon(dot)at> |
---|---|
To: | Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: nested transactions |
Date: | 2002-11-29 17:03:56 |
Message-ID: | 72ueuukn2vleinke8008vsbcd8o7kqkd2n@4ax.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, 28 Nov 2002 12:59:21 -0500 (EST), Bruce Momjian
<pgman(at)candle(dot)pha(dot)pa(dot)us> wrote:
>Yes, locking is one possible solution, but no one likes that. One hack
>lock idea would be to create a subtransaction-only lock, [...]
>
>> [...] without
>> having to touch the xids in the tuple headers.
>
>Yes, you could do that, but we can easily just set the clog bits
>atomically,
From what I read above I don't think we can *easily* set more than one
transaction's bits atomically.
> and it will not be needed --- the tuple bits really don't
>help us, I think.
Yes, this is what I said, or at least tried to say. I just wanted to
make clear how this new approach (use the fourth status) differs from
older proposals (replace subtransaction ids in tuple headers).
>OK, we put it in a file. And how do we efficiently clean it up?
>Remember, it is only to be used for a _brief_ period of time. I think a
>file system solution is doable if we can figure out a way not to create
>a file for every xid.
I don't want to create one file for every transaction, but rather a
huge (sparse) array of parent xids. This array is divided into
manageable chunks, represented by files, "pg_subtrans_NNNN". These
files are only created when necessary. At any time only a tiny part
of the whole array is kept in shared buffers. This concept is similar
or almost equal to pg_clog, which is an array of doublebits.
>Maybe we write the xid's to a file in a special directory in sorted
>order, and backends can do a btree search of each file in that directory
>looking for the xid, and then knowing the master xid, look up that
>status, and once all the children xid's are updated, you delete the
>file.
Yes, dense arrays or btrees are other possible implementations. But
for simplicity I'd do it pg_clog style.
>Yes, but again, the xid status of subtransactions is only update just
>before commit of the main transaction, so there is little value to
>having those visible.
Having them visible solves the atomicity problem without requiring
long locks. Updating the status of a single (main or sub) transaction
is atomic, just like it is now.
Here is what is to be done for some operations:
BEGIN main transaction:
Get a new xid (no change to current behaviour).
pg_clog[xid] is still 00, meaning active.
pg_subtrans[xid] is still 0, meaning no parent.
BEGIN subtransaction:
Push current transaction info onto local stack.
Get a new xid.
Record parent xid in pg_subtrans[xid].
pg_clog[xid] is still 00.
ROLLBACK subtransaction:
Set pg_clog[xid] to 10 (aborted).
Optionally set clog bits for subsubtransactions to 10.
Pop transaction info from stack.
COMMIT subtransaction:
Set pg_clog[xid] to 11 (committed subtrans).
Don't touch clog bits for subsubtransactions!
Pop transaction info from stack.
ROLLBACK main transaction:
Set pg_clog[xid] to 10 (aborted).
Optionally set clog bits for subtransactions to 10.
COMMIT main transaction:
Set pg_clog[xid] to 01 (committed).
Optionally set clog bits for subtransactions from 11 to 01.
Don't touch clog bits for aborted subtransactions!
Visibility check by other transactions: If a tuple is visited and its
XMIN/XMAX_IS_COMMITTED/ABORTED flags are not yet set, pg_clog has to
be consulted to find out the status of the inserting/deleting
transaction xid. If pg_clog[xid] is ...
00: transaction still active
10: aborted
01: committed
11: committed subtransaction, have to check parent
Only in this last case do we have to get parentxid from pg_subtrans.
Now we look at pg_clog[parentxid]. If we find ...
00: parent still active, so xid is considered active, too
10: parent aborted, so xid is considered aborted,
optionally set pg_clog[xid] = 10
01: parent committed, so xid is considered committed,
optionally set pg_clog[xid] = 01
11: recursively check grandparent(s) ...
For brevity the following operations are not covered in detail:
. Visibility checks for tuples inserted/deleted by a (sub)transaction
belonging to the current transaction tree (have to check local
transaction stack whenever we look at a xid or switch to a parent xid)
. HeapTupleSatisfiesUpdate (sometimes has to wait for parent
transaction)
The trick here is, that subtransaction status is immediately updated
in pg_clog on commit/abort. Main transaction commit is atomic (just
set its commit bit). Status 11 is short-lived, it is replaced with
the final status by one or more of
- COMMIT/ROLLBACK of the main transaction
- a later visibility check (as a side effect)
- VACUUM
pg_subtrans cleanup: A pg_subtrans_NNNN file covers a known range of
transaction ids. As soon as none of these transactions has a pg_clog
status of 11, the pg_subtrans_NNNN file can be removed. VACUUM can do
this, and it won't even have to check the heap.
Servus
Manfred
From | Date | Subject | |
---|---|---|---|
Next Message | Joe Conway | 2002-11-29 17:14:07 | Re: One SQL to access two databases. |
Previous Message | wade | 2002-11-29 16:36:10 | Re: Query performance. 7.2.3 Vs. 7.3 |