Re: Proposal for CSN based snapshots

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, Ants Aasma <ants(at)cybertec(dot)at>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Markus Wanner <markus(at)bluegap(dot)ch>
Subject: Re: Proposal for CSN based snapshots
Date: 2014-05-12 13:56:51
Message-ID: 5370D323.1070606@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 01/24/2014 02:10 PM, Rajeev rastogi wrote:
> We are also planning to implement CSN based snapshot.
> So I am curious to know whether any further development is happening on this.

I started looking into this, and plan to work on this for 9.5. It's a
big project, so any help is welcome. The design I have in mind is to use
the LSN of the commit record as the CSN (as Greg Stark suggested).

Some problems and solutions I have been thinking of:

The core of the design is to store the LSN of the commit record in
pg_clog. Currently, we only store 2 bits per transaction there,
indicating if the transaction committed or not, but the patch will
expand it to 64 bits, to store the LSN. To check the visibility of an
XID in a snapshot, the XID's commit LSN is looked up in pg_clog, and
compared with the snapshot's LSN.

Currently, before consulting the clog for an XID's status, it is
necessary to first check if the transaction is still in progress by
scanning the proc array. To get rid of that requirement, just before
writing the commit record in the WAL, the backend will mark the clog
slot with a magic value that says "I'm just about to commit". After
writing the commit record, it is replaced with the record's actual LSN.
If a backend sees the magic value in the clog, it will wait for the
transaction to finish the insertion, and then check again to get the
real LSN. I'm thinking of just using XactLockTableWait() for that. This
mechanism makes the insertion of a commit WAL record and updating the
clog appear atomic to the rest of the system.

With this mechanism, taking a snapshot is just a matter of reading the
current WAL insertion point. There is no need to scan the proc array,
which is good. However, it probably still makes sense to record an xmin
and an xmax in SnapshotData, for performance reasons. An xmax, in
particular, will allow us to skip checking the clog for transactions
that will surely not be visible. We will no longer track the latest
completed XID or the xmin like we do today, but we can use
SharedVariableCache->nextXid as a conservative value for xmax, and keep
a cached global xmin value in shared memory, updated when convenient,
that can be just copied to the snapshot.

In theory, we could use a snapshot LSN as the cutoff-point for
HeapTupleSatisfiesVisibility(). Maybe it's just because this is new, but
that makes me feel uneasy. In any case, I think we'll need a cut-off
point defined as an XID rather than an LSN for freezing purposes. In
particular, we need a cut-off XID to determine how far the pg_clog can
be truncated, and to store in relfrozenxid. So, we will still need the
concept of a global oldest xmin.

When a snapshot is just an LSN, taking a snapshot can no longer
calculate an xmin, like we currently do (there will be a snapshot LSN in
place of an xmin in the proc array). So we will need a new mechanism to
calculate the global oldest xmin. First scan the proc array to find the
oldest still in-progress XID. That - 1 will become the new oldest global
xmin, after all currently active snapshots have finished. We don't want
to sleep in GetOldestXmin(), waiting for the snapshots to finish, so we
should periodically advance a system-wide oldest xmin value, for example
whenever the walwrite process wakes up, so that when we need an
oldest-xmin value, we will always have a fairly recently calculated
value ready in shared memory.

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2014-05-12 14:24:59 Re: cannot to compile PL/V8 on Fedora 20
Previous Message Andrew Dunstan 2014-05-12 13:42:00 Re: cannot to compile PL/V8 on Fedora 20