From: | Neil Conway <neilc(at)samurai(dot)com> |
---|---|
To: | PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | RFC: bufmgr locking changes |
Date: | 2004-01-07 22:37:09 |
Message-ID: | 87k7431j3e.fsf@mailbox.samurai.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
I've attached a (gzip'ed) patch that makes the following changes to
the buffer manager:
(1) Overhaul locking; whenever the code needs to modify the state
of an individual buffer, do synchronization via a per-buffer
"meta data lock" rather than the global BufMgrLock. For more
info on the motivation for this, see the several recent
-hackers threads on the subject.
(2) Reduce superfluous dirtying of pages: previously,
LockBuffer(buf, LW_EXCLUSIVE) would automatically dirty the
buffer. This behavior has been removed, to reduce the number
of pages that are dirtied. For more info on the motivation for
this, see the previous -hackers thread on the topic.
(3) Simplify the code in a bunch of places
This basically involved rewriting bufmgr.c (which makes the patch a
little illegible; it might be easier to just apply it and read through
the new code). That also means there are still a fair number of bugs
and unresolved issues.
The new locking works as follows:
- modifying the shared buffer table (buf_table.c) or making calls
into the freelist code (freelist.c) still requires holding the
BufMgrLock. There was some talk of trying to add more granular
locking to freelist.c itself: if there is still significant
contention for the BufMgrLock after these changes have been
applied, I'll take a look at doing that
- to modify a BufferDesc's meta data (shared refcount, flags, tag,
etc.), one must hold the buffer's "meta data lock". Also, I
removed the existing "io_in_progress" lock; instead, we now hold
the buffer's meta data lock while doing buffer I/O.
- if a backend holds the BufMgrLock, it will never try to
LWLockAcquire() a per-buffer meta data lock, due to the risk of
deadlock (and the loss in concurrency if we got blocked waiting
while still holding the BufMgrLock). Instead it does a
LWLockConditionalAcquire() and handles an acquisition failure
sanely
- if a backend holds the meta data lock for a buffer, it CAN
attempt to LWLockAcquire() the BufMgrLock. This is safe from
deadlocks, due to the previous paragraph.
The code is still a work-in-progress (running `pgbench -s 30 -c 20 -t
1000` bails out pretty quickly, for example), but any comments on
whether this scheme is in-theory correct would be very welcome.
For #2, my understanding of the existing XLOG code is incomplete, so
perhaps someone can tell me if I'm on the right track. I've modified a
few of the XLogInsert() call sites so that they now:
- acquire an exclusive buffer cntx_lock
- modify the content of the page/buffer
- WriteNoReleaseBuffer() to mark the buffer as dirty; since we
still hold the buffer's cntx_lock, a checkpoint process can't
write this page to disk yet. This replaces the implicit "mark
this page as dirty" behavior of LockBuffer()
- do the XLogInsert()
- release the cntx_lock
- ReleaseBuffer() to decrement the buffer's pincount
For example, sequence.c now works like this (I've probably missed a
few places) -- is this correct, and/or is there a better way to do it?
Notes, and caveats:
- Remove SetBufferCommitInfoNeedsSave(). AFAICS, this is now
completely equivalent to WriteNoReleaseBuffer(), so I just removed
the former and replaced all the calls to it with calls to the later.
- Moved the code for flushing a dirty buffer to disk to buf_io.c
- Moved UnpinBuffer() and PinBuffer() to bufmgr.c, from freelist.c
- Removed the following now-unused buffer flag bits: BM_VALID,
BM_DELETED, BM_JUST_DIRTIED, BM_IO_IN_PROGRESS, and shrunk the
'flags' field of the BufferDesc down to 8 bits (from 16)
- Removed the cntxDirty field from the BufferDesc: now that we don't
need to acquire the BufMgrLock to modify the buffer's flags, there's
no reason to keep this around
- Make 'PrivateRefCount' an array of uint16s, rather than longs. This
saves 2 bits * shared_buffers per backend on 32-bit machines and 6
bits * shared_buffers per backend on some 64-bit machines. It means
a given backend can only pin a single buffer 65,636 times, but that
should be more than enough. Similarly, made LocalRefCount an array
of uint16s.
I was thinking of adding overflow checking to the refcount increment
code to make sure we fail safely if a backend *does* try to exceed
this number of pins, but I can't imagine a scenario when it would
actually occur, so I haven't bothered.
- Remove the BufferLocks array. AFAICS this wasn't actually necessary.
- A few loose ends in the code still need to be wrapped up (for
example, I need to take another glance at the pin-waiter stuff, and
the error handling still needs some more work), but I think most of
the functionality is there. Areas of concern are denoted by 'XXX'
comments.
Any comments?
-Neil
Attachment | Content-Type | Size |
---|---|---|
buffer-locking-45.patch.gz | application/octet-stream | 31.6 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Kurt Roeckx | 2004-01-08 01:05:18 | Re: RFC: bufmgr locking changes |
Previous Message | Alex Satrapa | 2004-01-07 22:22:06 | [OT]"Copyright infringement" vs "piracy" (was Re: Paypal) |