Performance problems on 4-way AMD Opteron 875 (dual core)

From: Dirk Lutzebäck <lutzeb(at)aeccom(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Cc: Sven Geisler <sgeisler(at)aeccom(dot)com>
Subject: Performance problems on 4-way AMD Opteron 875 (dual core)
Date: 2005-08-05 11:11:31
Message-ID: 42F34963.5080403@aeccom.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

[[I'm posting this on behalf of my co-worker who cannot post to this
list at the moment]]

Hi,

I had installed PostgreSQL on a 4-way AMD Opteron 875 (dual core) and
the performance isn't on the expected level.

Details:
The "old" server is a 4-way XEON MP 3.0 GHz with 4MB L3 cache, 32 GB RAM
(PC1600) and local FC-RAID 10. Hyper-Threading is off. (DL580)
The "old" server is using Red Hat Enterprise Linux 3 Update 5.
The "new" server is a 4-way Opteron 875 with 1 MB L2 cache, 32 GB RAM
(PC3200) and the same local FC-RAID 10. (HP DL585)
The "new" server is using Red Hat Enterprise Linux 4 (with the latest
x86_64 kernel from Red Hat - 2.6.9-11.ELsmp #1 SMP Fri May 20 18:25:30
EDT 2005 x86_64)
I use PostgreSQL version 8.0.3.

The issue is that the Opteron is slower as the XEON MP under high load.
I have created a test with parallel queries which are typical for my
application. The queries are in a range of small queries (0.1 seconds)
and larger queries using join (15 seconds).
The test starts parallel clients. Each clients runs the queries in a
random order. The test takes care that a client use always the same
random order to get valid results.

Here are the number of queries which the server has finished in a fix
period of time.
I used PostgreSQL 8.1 snapshot from last week compiled as 64bit binary
for DL585-64bit.
I used PostgreSQL 8.0.3 compiled as 32bit binary for DL585-32bit and DL580.
During the tests everything which is needed is in the file cache. I
didn't have read activity.
Context switch spikes are over 50000 during the test on both server. My
feeling is that the XEON has a tick more context switches.

PostgreSQL params:
max_locks_per_transaction = 256
shared_buffers = 40000
effective_cache_size = 3840000
work_mem = 300000
maintenance_work_mem = 512000
wal_buffers = 32
checkpoint_segments = 24

I was expecting two times more queries on the DL585. The DL585 with
PostgreSQL 8.0.3 32bit does meltdown earlier as the XEON in production
use. Please compare 4 clients and 8 clients. With 4 clients the Opteron
is in front and with 8 clients the XEON doesn't meltdown that much as
the Opteron.

I don't have any idea what cause this. Benchmarks like SAP's SD 2-tier
showing that the DL585 can handle nearly three times more load as the
DL580 with XEON 3.0. We choose the 4-way Opteron 875 based on such
benchmark to replace the 4-way XEON MP.

Does anyone have comments or ideas on which I have to focus my work?

I guess, the shared buffer cause the meltdown when to many clients are
accessing the same data.
I didn't understand why the 4-way XEON MP 3.0 can deal with this better
as the 4-way Opteron 875.
The system load on the Opteron is never over 3.0. The XEON MP has a load
up to 4.0.

Should I try other settings for PostgreSQL in postgresql.conf?
Should I try other setting for the compilation?

I will compile the latest PostgreSQL 8.1 snapshot for 32bit to evaluate
the new shared buffer code from Tom.
I think, the 64bit is slow because my queries are CPU intensive.

Can someone provide a commercial support contact for this issue?

Sven.

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Michael Stone 2005-08-05 12:27:19 Re: Performance problems on 4-way AMD Opteron 875 (dual core)
Previous Message John A Meinel 2005-08-04 22:42:52 Re: Performance problems testing with Spamassassin 3.1.0