CLOG extension

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: CLOG extension
Date: 2012-05-03 16:37:54
Message-ID: CA+Tgmoa6aZkd=HY=o41UrZqRXHFie5i3kw5HZRCEWhy0VXaXFg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Currently, the following can happen:

1. A backend needs a new transaction, so it calls
GetNewTransactionId(). It acquires XidGenLock and then calls
ExtendCLOG().
2. ExtendCLOG() decides that a new CLOG page is needed, so it acquires
CLogControlLock and then calls ZeroCLOGPage().
3. ZeroCLOGPage() calls WriteZeroPageXlogRec(), which calls XLogInsert().
4. XLogInsert() acquires WALInsertLock and then calls AdvanceXLInsertBuffer().
5. AdvanceXLInsertBuffer() sees that WAL buffers may be full and
acquires WALWriteLock to check, and possibly to write WAL if the
buffers are in fact full.

At this point, we have a single backend simultaneously holding
XidGenLock, CLogControlLock, WALInsertLock, and WALWriteLock, which
from a concurrency standpoint is, at the risk of considerable
understatement, not so great. The situation is no better if (as seems
to be more typical) we block waiting for WALWriteLock rather than
actually holding it ourselves: either way, nobody can get perform any
WAL-logged operation, get an XID, or consult CLOG - so all write
activity is blocked, and read activity will block as well as soon as
it hits an unhinted tuple. This leads to a couple of questions.

First, do we really need to WAL-log CLOG extension at all? Perhaps
recovery should simply extend CLOG when it hits a commit or abort
record that references a page that doesn't exist yet.

Second, is there any harm in pre-extending CLOG? Currently, we don't
extend CLOG until we get to the point where the XID we're allocating
is on a page that doesn't exist yet, so no further XIDs can be
assigned until the extension is complete. We could avoid that by
extending a page in advance. Right now, whenever a backend rolls onto
a new CLOG page, it must first create it. What we could do instead is
try to stay one page ahead of whatever we're currently using: whenever
a backend rolls onto a new CLOG page, it creates *the next page*.
That way, it can release XidGenLock first and *then* call
ExtendCLOG(). That allows all the other backends to continue
allocating XIDs in parallel with the CLOG extension. In theory we
could still get a pile-up if the entire page worth of XIDs gets used
up before we can finish the extension, but that should be pretty rare.

(Alternatively, we could introduce a separate background process to
extend CLOG, and just have foreground processes kick it periodically.
This currently seems like overkill to me.)

Third, assuming we do need to write WAL, can we somehow rejigger the
logging so that we need not hold CLogControlLock while we're writing
it, so that other backends can still do CLOG lookups during that time?
Maybe when we take CLogControlLock and observe that extension is
needed, we can release CLogControlLock, WAL-log the extension, and
then retake CLogControlLock to do SimpleLruZeroPage(). We might need
a separate CLogExtensionLock to make sure that two different backends
aren't trying to do this dance at the same time, but that should be
largely uncontended.

Thoughts?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2012-05-03 16:39:28 Re: remove dead ports?
Previous Message Tom Lane 2012-05-03 16:30:01 Re: Advisory locks seem rather broken