From: | Bruce Momjian <bruce(at)momjian(dot)us> |
---|---|
To: | Simon Riggs <simon(at)2ndquadrant(dot)com> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Reducing Transaction Start/End Contention |
Date: | 2007-09-14 03:16:42 |
Message-ID: | 200709140316.l8E3Ggq19385@momjian.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
This has been saved for the 8.4 release:
http://momjian.postgresql.org/cgi-bin/pgpatches_hold
---------------------------------------------------------------------------
Simon Riggs wrote:
> Jignesh Shah's scalability testing on Solaris has revealed further
> tuning opportunities surrounding the start and end of a transaction.
> Tuning that should be especially important since async commit is likely
> to allow much higher transaction rates than were previously possible.
>
> There is strong contention on the ProcArrayLock in Exclusive mode, with
> the top path being CommitTransaction(). This becomes clear as the number
> of connections increases, but it seems likely that the contention can be
> caused in a range of other circumstances. My thoughts on the causes of
> this contention are that the following 3 tasks contend with each other
> in the following way:
>
> CommitTransaction(): takes ProcArrayLock Exclusive
> but only needs access to one ProcArray element
>
> waits for
>
> GetSnapshotData():ProcArrayLock Shared
> ReadNewTransactionId():XidGenLock Shared
>
> which waits for
>
> GetNextTransactionId()
> takes XidGenLock Exclusive
> ExtendCLOG(): takes ClogControlLock Exclusive, WALInsertLock Exclusive
> two possible place where I/O is required
> ExtendSubtrans(): takes SubtransControlLock()
> one possible place where I/O is required
> Avoids lock on ProcArrayLock: atomically updates one ProcArray element
>
>
> or more simply:
>
> CommitTransaction() -- i.e. once per transaction
> waits for
> GetSnapshotData() -- i.e. once per SQL statement
> which waits for
> GetNextTransactionId() -- i.e. once per transaction
>
> This gives some goals for scalability improvements and some proposals.
> (1) and (2) are proposals for 8.3 tuning, the others are directions for
> further research.
>
>
> Goal: Reduce total time that GetSnapshotData() waits for
> GetNextTransactionId()
>
> 1. Increase size of Clog-specific BLCKSZ
> Clog currently uses BLCKSZ to define the size of clog buffers. This can
> be changed to use CLOG_BLCKSZ, which would then be set to 32768.
> This will naturally increase the amount of memory allocated to the clog,
> so we need not alter CLOG_BUFFERS above 8 if we do this (as previously
> suggested, with successful results). This will also reduce the number of
> ExtendClog() calls, which will probably reduce the overall contention
> also.
>
> 2. Perform ExtendClog() as a background activity
> Background process can look at the next transactionid once each cycle
> without holding any lock. If the xid is almost at the point where a new
> clog page would be allocated, then it will allocate one prior to the new
> page being absolutely required. Doing this as a background task would
> mean that we do not need to hold the XidGenLock in exclusive mode while
> we do this, which means that GetSnapshotData() and CommitTransaction()
> would also be less likely to block. Also, if any clog writes need to be
> performed when the page is moved forwards this would also be performed
> in the background.
>
> 3. Consider whether ProcArrayLock should use a new queued-shared lock
> mode that puts a maximum wait time on ExclusiveLock requests. It would
> be fairly hard to implement this well as a timer, but it might be
> possible to place a limit on queue length. i.e. allow Share locks to be
> granted immediately if a Shared holder already exists, but only if there
> is a queue of no more than N exclusive mode requests queued. This might
> prevent the worst cases of exclusive lock starvation.
>
> 4. Since shared locks are currently queued behind exclusive requests
> when they cannot be immediately satisfied, it might be worth
> reconsidering the way LWLockRelease works also. When we wake up the
> queue we only wake the Shared requests that are adjacent to the head of
> the queue. Instead we could wake *all* waiting Shared requestors.
>
> e.g. with a lock queue like this:
> (HEAD) S<-S<-X<-S<-X<-S<-X<-S
> Currently we would wake the 1st and 2nd waiters only.
>
> If we were to wake the 3rd, 5th and 7th waiters also, then the queue
> would reduce in length very quickly, if we assume generally uniform
> service times. (If the head of the queue is X, then we wake only that
> one process and I'm not proposing we change that). That would mean queue
> jumping right? Well thats what already happens in other circumstances,
> so there cannot be anything intrinsically wrong with allowing it, the
> only question is: would it help?
>
> We need not wake the whole queue, there may be some generally more
> beneficial heuristic. The reason for considering this is not to speed up
> Shared requests but to reduce the queue length and thus the waiting time
> for the Xclusive requestors. Each time a Shared request is dequeued, we
> effectively re-enable queue jumping, so a Shared request arriving during
> that point will actually jump ahead of Shared requests that were unlucky
> enough to arrive while an Exclusive lock was held. Worse than that, the
> new incoming Shared requests exacerbate the starvation, so the more
> non-adjacent groups of Shared lock requests there are in the queue, the
> worse the starvation of the exclusive requestors becomes. We are
> effectively randomly starving some shared locks as well as exclusive
> locks in the current scheme, based upon the state of the lock when they
> make their request. The situation is worst when the lock is heavily
> contended and the workload has a 50/50 mix of shared/exclusive requests,
> e.g. serializable transactions or transactions with lots of
> subtransactions.
>
>
> Goal: Reduce the total time that CommitTransaction() waits for
> GetSnapshotData()
>
> 5. Reduce the time that GetSnapshotData holds ProcArray lock. To do
> this, we split the ProcArrayLock into multiple partitions (as suggested
> by Alvaro). There are comments in GetNewTransactionId() about having one
> spinlock per ProcArray entry. This would be too many and we could reduce
> contention by having one lock for each N ProcArray entries. Since we
> don't see too much contention with 100 users (default) it would seem
> sensible to make N ~ 120. Striped or contiguous? If we stripe the lock
> partitions then we will need multiple partitions however many users we
> have connected, whereas using contiguous ranges would allow one lock for
> low numbers of users and yet enough locks for higher numbers of users.
>
> 6. Reduce the number of times ProcArrayLock is called in Exclusive mode.
> To do this, optimise group commit so that all of the actions for
> multiple transactions are executed together: flushing WAL, updating CLOG
> and updating ProcArray, whenever it is appropriate to do so. There's no
> point in having a group commit facility that optimises just one of those
> contention points when all 3 need to be considered. That needs to be
> done as part of a general overhaul of group commit. This would include
> making TransactionLogMultiUpdate() take CLogControlLock once for each
> page that it needs to access, which would also reduce contention from
> TransactionIdCommitTree().
>
> (1) and (2) can be patched fairly easily for 8.3. I have a prototype
> patch for (1) on the shelf already from 6 months ago.
>
> (3), (4) and (5) seem like changes that would require significant
> testing time to ensure we did it correctly, even though the patches
> might be fairly small. I'm thinking this is probably an 8.4 change, but
> I can get test versions out fairly quickly I think.
>
> (6) seems definitely an 8.4 change.
>
> --
> Simon Riggs
> EnterpriseDB http://www.enterprisedb.com
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
> http://archives.postgresql.org
--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
From | Date | Subject | |
---|---|---|---|
Next Message | Bruce Momjian | 2007-09-14 03:19:56 | Re: [GENERAL] ascii() for utf8 |
Previous Message | Alvaro Herrera | 2007-09-14 01:10:00 | Re: autovacuum launcher eating too much CPU |