From: | Dave Cramer <pg(at)fastcrypt(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | lutzeb(at)aeccom(dot)com, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-performance(at)postgreSQL(dot)org, Neil Conway <neilc(at)samurai(dot)com> |
Subject: | Re: Wierd context-switching issue on Xeon |
Date: | 2004-04-18 23:34:41 |
Message-ID: | 1082331281.1557.47.camel@localhost.localdomain |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
So the the kernel/OS is irrelevant here ? this happens on any dual xeon?
What about hypterthreading does it still happen if HTT is turned off ?
Dave
On Sun, 2004-04-18 at 17:47, Tom Lane wrote:
> After some further digging I think I'm starting to understand what's up
> here, and the really fundamental answer is that a multi-CPU Xeon MP box
> sucks for running Postgres.
>
> I did a bunch of oprofile measurements on a machine belonging to one of
> Josh's clients, using a test case that involved heavy concurrent access
> to a relatively small amount of data (little enough to fit into Postgres
> shared buffers, so that no I/O or kernel calls were really needed once
> the test got going). I found that by nearly any measure --- elapsed
> time, bus transactions, or machine-clear events --- the spinlock
> acquisitions associated with grabbing and releasing the BufMgrLock took
> an unreasonable fraction of the time. I saw about 15% of elapsed time,
> 40% of bus transactions, and nearly 100% of pipeline-clear cycles going
> into what is essentially two instructions out of the entire backend.
> (Pipeline clears occur when the cache coherency logic detects a memory
> write ordering problem.)
>
> I am not completely clear on why this machine-level bottleneck manifests
> as a lot of context swaps at the OS level. I think what is happening is
> that because SpinLockAcquire is so slow, a process is much more likely
> than you'd normally expect to arrive at SpinLockAcquire while another
> process is also acquiring the spinlock. This puts the two processes
> into a "lockstep" condition where the second process is nearly certain
> to observe the BufMgrLock as locked, and be forced to suspend itself,
> even though the time the first process holds the BufMgrLock is not
> really very long at all.
>
> If you google for Xeon and "cache coherency" you'll find quite a bit of
> suggestive information about why this might be more true on the Xeon
> setup than others. A couple of interesting hits:
>
> http://www.theinquirer.net/?article=10797
> says that Xeon MP uses a *slower* FSB than Xeon DP. This would
> translate directly to more time needed to transfer a dirty cache line
> from one processor to the other, which is the basic operation that we're
> talking about here.
>
> http://www.aceshardware.com/Spades/read.php?article_id=30000187
> says that Opterons use a different cache coherency protocol that is
> fundamentally superior to the Xeon's, because dirty cache data can be
> transferred directly between two processor caches without waiting for
> main memory.
>
> So in the short term I think we have to tell people that Xeon MP is not
> the most desirable SMP platform to run Postgres on. (Josh thinks that
> the specific motherboard chipset being used in these machines might
> share some of the blame too. I don't have any evidence for or against
> that idea, but it's certainly possible.)
>
> In the long run, however, CPUs continue to get faster than main memory
> and the price of cache contention will continue to rise. So it seems
> that we need to give up the assumption that SpinLockAcquire is a cheap
> operation. In the presence of heavy contention it won't be.
>
> One thing we probably have got to do soon is break up the BufMgrLock
> into multiple finer-grain locks so that there will be less contention.
> However I am wary of doing this incautiously, because if we do it in a
> way that makes for a significant rise in the number of locks that have
> to be acquired to access a buffer, we might end up with a net loss.
>
> I think Neil Conway was looking into how the bufmgr might be
> restructured to reduce lock contention, but if he had come up with
> anything he didn't mention exactly what. Neil?
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
> (send "unregister YourEmailAddressHere" to majordomo(at)postgresql(dot)org)
>
>
>
> !DSPAM:4082feb7326901956819835!
>
>
--
Dave Cramer
519 939 0336
ICQ # 14675561
From | Date | Subject | |
---|---|---|---|
Next Message | Rod Taylor | 2004-04-18 23:42:55 | Re: sunquery and estimated rows |
Previous Message | Tom Lane | 2004-04-18 23:09:26 | Re: sunquery and estimated rows |