From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Noah Misch <noah(at)leadboat(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alexey Klyukin <alexk(at)commandprompt(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Reducing overhead of frequent table locks |
Date: | 2011-05-14 00:55:34 |
Message-ID: | BANLkTik-15v5d3FLMJk_utYNkCyLHbXajA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, May 13, 2011 at 4:16 PM, Noah Misch <noah(at)leadboat(dot)com> wrote:
> The key is putting a rapid hard stop to all fast-path lock acquisitions and
> then reconstructing a valid global picture of the affected lock table regions.
> Your 1024-way table of strong lock counts sounds promising. (Offhand, I do
> think they would need to be counts, not just flags.)
>
> If I'm understanding correctly, your pseudocode would look roughly like this:
>
> if (level >= ShareUpdateExclusiveLock)
> ++strong_lock_counts[my_strong_lock_count_partition]
> sfence
> if (strong_lock_counts[my_strong_lock_count_partition] == 1)
> /* marker 1 */
> import_all_local_locks
> normal_LockAcquireEx
> else if (level <= RowExclusiveLock)
> lfence
> if (strong_lock_counts[my_strong_lock_count_partition] == 0)
> /* marker 2 */
> local_only
> /* marker 3 */
> else
> normal_LockAcquireEx
> else
> normal_LockAcquireEx
I think ShareUpdateExclusiveLock should be treated as neither weak nor
strong. It certainly can't be treated as weak - i.e. use the fast
path - because it's self-conflicting. It could be treated as strong,
but since it doesn't conflict with any of the weak lock types, that
would only serve to prevent fast-path lock acquisitions that otherwise
could have succeeded. In particular, it would unnecessarily disable
fast-path lock acquisition for any relation being vacuumed, which
could be really ugly considering that one of the main workloads that
would benefit from something like this is the case where lots of
backends are fighting over a lock manager partition lock on a table
they all want to run read and/or modify. I think it's best for
ShareUpdateExclusiveLock to always use the regular lock-acquisition
path, but it need not worry about incrementing strong_lock_counts[] or
importing local locks in so doing.
Also, I think in the step just after marker one, we'd only import only
local locks whose lock tags were equal to the lock tag on which we
were attempting to acquire a strong lock. The downside of this whole
approach is that acquiring a strong lock becomes, at least
potentially, a lot slower, because you have to scan through the whole
backend array looking for fast-path locks to import (let's not use the
term "local lock", which is already in use within the lock manager
code). But maybe that can be optimized enough not to matter. After
all, if the lock manager scaled perfectly at high concurrency, we
wouldn't be thinking about this in the first place.
> At marker 1, we need to block until no code is running between markers two and
> three. You could do that with a per-backend lock (LW_SHARED by the strong
> locker, LW_EXCLUSIVE by the backend). That would probably still be a win over
> the current situation, but it would be nice to have something even cheaper.
I don't have a better idea than to use an LWLock. I have a patch
floating around to speed up our LWLock implementation, but I haven't
got a workload where the bottleneck is the actual speed of operation
of the LWLock rather than the fact that it's contended in the first
place. And the whole point of this would be to arrange things so that
the LWLocks are uncontended nearly all the time.
> Then you have the actual procedure for transfer of local locks to the global
> lock manager. Using record locks in a file could work, but that's a system call
> per lock acquisition regardless of the presence of strong locks. Is that cost
> sufficiently trivial?
No, I don't think we want to go into kernel space. When I spoke of
using a file, I was imagining it as an mmap'd region that one backend
could write to which, at need, another backend could mmap and grovel
through. Another (perhaps simpler) option would be to just put it in
shared memory. That doesn't give you as much flexibility in terms of
expanding the segment, but it would be reasonable to allow space for
only, dunno, say 32 locks per backend in shared memory; if you need
more than that, you flush them all to the main lock table and start
over. You could possibly even just make this a hack for the
particular special case where we're taking a relation lock on a
non-shared relation; then you'd need only 128 bytes for a 32-entry
array, plus the LWLock (I think the database ID is already visible in
shared memory).
> I wonder if, instead, we could signal all backends at
> marker 1 to dump the applicable parts of their local (memory) lock tables to
> files. Or to another shared memory region, if that didn't mean statically
> allocating the largest possible required amount. If we were willing to wait
> until all backends reach a CHECK_FOR_INTERRUPTS, they could instead make the
> global insertions directly. That might yield a decent amount of bug swatting to
> fill in missing CHECK_FOR_INTERRUPTS, though.
I've thought about this; I believe it's unworkable. If one backend
goes into the tank (think: SIGSTOP, or blocking on I/O to an
unreadable disk sector) this could lead to cascading failure.
> Improvements in this area would also have good synergy with efforts to reduce
> the global impact of temporary table usage. CREATE TEMP TABLE can be the
> major source of AccessExclusiveLock acquisitions. However, with the strong
> lock indicator partitioned 1024 ways or more, that shouldn't be a deal killer.
If that particular case is a problem for you, it seems like optimizing
away the exclusive lock altogether might be possible. No other
backend should be able to touch the table until the transaction
commits anyhow, and at that point we're going to release the lock.
There are possibly some sticky wickets here but it seems at least
worth thinking about.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2011-05-14 00:59:58 | Re: the big picture for index-only scans |
Previous Message | Bruce Momjian | 2011-05-14 00:45:14 | Re: pg_upgrade and PGPORT |