Re: Wierd context-switching issue on Xeon

From: "Sven Geisler" <sgeisler(at)aeccom(dot)com>
To: <lutzeb(at)aeccom(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Josh Berkus" <josh(at)agliodbs(dot)com>, <pgsql-performance(at)postgreSQL(dot)org>, "Neil Conway" <neilc(at)samurai(dot)com>
Subject: Re: Wierd context-switching issue on Xeon
Date: 2004-04-19 12:27:44
Message-ID: 004201c42609$b7321390$6402a8c0@andesW2K
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi Tom,

Just to explain our hardware situation releated to the FSB of the XEON's.
We have older XEON DP in operation with FSB 400 and 2.4 GHz.
The XEON MP box runs with 2.5 GHz.
The XEON MP box is a Fujitsu Siemens Primergy RX600 with ServerWorks GC LE
as chipset.

The box, which Dirk were use to compare the behavior, is our newest XEON DP
system.
This XEON DP box runs with 2.8 GHz and FSB 533 using the Intel 7501 chipset
(Supermicro).

I would agree to Jush. When PostgreSQL has an issue with the INTEL XEON MP
hardware, this is more releated to the chipset.

Back to the SQL-Level. We use SELECT FOR UPDATE as "semaphore".
Should we try another implementation for this semahore on the client side to
prevent this issue?

Regards
Sven.

----- Original Message -----
From: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: <lutzeb(at)aeccom(dot)com>
Cc: "Josh Berkus" <josh(at)agliodbs(dot)com>; <pgsql-performance(at)postgreSQL(dot)org>;
"Neil Conway" <neilc(at)samurai(dot)com>
Sent: Sunday, April 18, 2004 11:47 PM
Subject: Re: [PERFORM] Wierd context-switching issue on Xeon

> After some further digging I think I'm starting to understand what's up
> here, and the really fundamental answer is that a multi-CPU Xeon MP box
> sucks for running Postgres.
>
> I did a bunch of oprofile measurements on a machine belonging to one of
> Josh's clients, using a test case that involved heavy concurrent access
> to a relatively small amount of data (little enough to fit into Postgres
> shared buffers, so that no I/O or kernel calls were really needed once
> the test got going). I found that by nearly any measure --- elapsed
> time, bus transactions, or machine-clear events --- the spinlock
> acquisitions associated with grabbing and releasing the BufMgrLock took
> an unreasonable fraction of the time. I saw about 15% of elapsed time,
> 40% of bus transactions, and nearly 100% of pipeline-clear cycles going
> into what is essentially two instructions out of the entire backend.
> (Pipeline clears occur when the cache coherency logic detects a memory
> write ordering problem.)
>
> I am not completely clear on why this machine-level bottleneck manifests
> as a lot of context swaps at the OS level. I think what is happening is
> that because SpinLockAcquire is so slow, a process is much more likely
> than you'd normally expect to arrive at SpinLockAcquire while another
> process is also acquiring the spinlock. This puts the two processes
> into a "lockstep" condition where the second process is nearly certain
> to observe the BufMgrLock as locked, and be forced to suspend itself,
> even though the time the first process holds the BufMgrLock is not
> really very long at all.
>
> If you google for Xeon and "cache coherency" you'll find quite a bit of
> suggestive information about why this might be more true on the Xeon
> setup than others. A couple of interesting hits:
>
> http://www.theinquirer.net/?article=10797
> says that Xeon MP uses a *slower* FSB than Xeon DP. This would
> translate directly to more time needed to transfer a dirty cache line
> from one processor to the other, which is the basic operation that we're
> talking about here.
>
> http://www.aceshardware.com/Spades/read.php?article_id=30000187
> says that Opterons use a different cache coherency protocol that is
> fundamentally superior to the Xeon's, because dirty cache data can be
> transferred directly between two processor caches without waiting for
> main memory.
>
> So in the short term I think we have to tell people that Xeon MP is not
> the most desirable SMP platform to run Postgres on. (Josh thinks that
> the specific motherboard chipset being used in these machines might
> share some of the blame too. I don't have any evidence for or against
> that idea, but it's certainly possible.)
>
> In the long run, however, CPUs continue to get faster than main memory
> and the price of cache contention will continue to rise. So it seems
> that we need to give up the assumption that SpinLockAcquire is a cheap
> operation. In the presence of heavy contention it won't be.
>
> One thing we probably have got to do soon is break up the BufMgrLock
> into multiple finer-grain locks so that there will be less contention.
> However I am wary of doing this incautiously, because if we do it in a
> way that makes for a significant rise in the number of locks that have
> to be acquired to access a buffer, we might end up with a net loss.
>
> I think Neil Conway was looking into how the bufmgr might be
> restructured to reduce lock contention, but if he had come up with
> anything he didn't mention exactly what. Neil?
>
> regards, tom lane
>
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Dave Cramer 2004-04-19 12:32:33 Re: Wierd context-switching issue on Xeon
Previous Message Greg Stark 2004-04-19 12:16:36 Re: Horribly slow hash join