ATA disks and RAID controllers for database servers

From: Mark Kirkwood <markir(at)paradise(dot)net(dot)nz>
To: pgsql-general(at)postgresql(dot)org
Subject: ATA disks and RAID controllers for database servers
Date: 2003-10-31 09:55:33
Message-ID: 3FA23195.8060506@paradise.net.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Dear all,

Here is the first installment concerning ATA disks and RAID controller
use in a database server. I happened to have a Solaris system to myself
this week, so took the opportunity to use it as a "control".

In this post I used the ATA RAID controller merely to enable UDMA 133
for an oldish x86 machine, the effect of any actual RAID level will
(hopefully) be examined subsequently.

So what I was attempting to examine here was : is it feasable to build a
reasonably well performing database server using ATA disks? (in
particular would disabling the ATA write cache spoil performance
completely?)

The Systems
-----------

Dell 410
2x700Mhz PIII 512Mb
Promise Fastrack TX2000 Controller
2x40G 7200RPM ATA-133 Maxtor Diamond +8 configured as JBOD
Freebsd 4.8 (options SMP APIC_IO i686)
Postgresql 7.4beta2 (-O2 -funroll-loops -fexpensive-optimizations
-march=i686)
ATA Write caching controlled via the loader.conf variable hw.ata.wc (1 = on)

Sun 280R
1x900Mhz USparc III 1024Mb
1x36G 10000RPM FCAL Sun (actually Seagate)
Solaris 8 (recommended patches)
Postgresql 7.4beta2 (-O2 -funroll-loops -fexpensive-optimizations)

The Tests
---------

1. Sequential and random writes and reads of a file twice the size of memory

Files were written using read(2), write(2) functions - buffered at 8K.
For the random case 10% of the file was sampled using lseek(2), and read
or written.
(see
http://techdocs.postgresql.org/markir/download/iowork/iowork-1.0.tar.gz)

2. Postgresql pgbench benchmark program

This was run using the options :
-t 1000 [ 1000 transactions ]
-s 10 [ scale factor 10 ]
-c 1,2,4,8,16 [ 1-16 clients ]

Non default postgresql.conf settings were:
shared_buffers = 5000
wal_buffers = 100
checkpoint_segments = 10

A checkpoint was forced after each run to prevent cross run interference.

Results
-------

Test 1

System IO Operation Throughput(M/s) Options
------------------------------------------------
Sun seq write 21
seq read 48
random write 2.8
random read 2.2

Dell seq write 11 hw.ata.wc=0
seq read 50 hw.ata.wc=0
random write 1.27 hw.ata.wc=0
random read 4.2 hw.ata.wc=0

Dell seq write 20 hw.ata.wc=1
seq read 53 hw.ata.wc=1
random write 1.69 hw.ata.wc=1
random read 4.1 hw.ata.wc=1

Test 2

System Clients Throughput(tps) Options
------------------------------------------------
Sun 1 18
2 18
4 22
8 23
16 28

Dell 1 27 hw.ata.wc=0
2 38 hw.ata.wc=0
4 55 hw.ata.wc=0
8 58 hw.ata.wc=0
16 66 hw.ata.wc=0

Dell 1 82 hw.ata.wc=1
2 137 hw.ata.wc=1
4 166 hw.ata.wc=1
8 128 hw.ata.wc=1
16 117 hw.ata.wc=1

Conclusions
-----------

Test 1

As far as sequential reading goes, there is not much to pick and choose
between ATA and SCSI.

ATA with write caching off does only about half as well for as SCSI for
sequential writes. It also fares poorly at random writes - even with
write caching on.

The random read result was surprising - I was expecting SCSI to perform
better on all random operations (seek time on the SCSI drive is about
1/2 that of the ATA). The "my program is measuring wrong" syndrome
featured strongly, so I have run similar tests with Bonnie - it finds
the ATA drive can do 4 *times* more seeks/s - hmmm (Bonnie gets the same
sequential throughput numbers too).

A point to note for *both* systems is that all disks were new, so have
not yet 'burned in' - I don't know how significant this might be (anyone?).

Test 2

Hmmm, 3 year old Dell 410 hammers this year's Sun 280R (write caching on
or off). Now it is well known that Solaris is not the fastest platform
for Pg, so maybe let's contain the excitement here. I did experiment
with using bsdmalloc to improve Solaris memory performance - without a
significant improvement (any other ideas?).

But it seems safe to conclude that it's possible to construct a
reasonably well performing ATA based system - even if write caching is off.

Criticisms
----------

Using "-s 10" only produces a database of 160M - this is cacheable when
you have 512/1024M real memory, so maybe "-s 100" would defeat the
cache. I am currently running some tests with this configuration.

Comparing a dual processor Intel to single Sun is not fair - well, a
900Mhz UltraSparc III is *supposed* to correspond to a 1.4Ghz Intel, so
2x700Mhz PIIIs should be a fair match. However it does look like the two
PIIIs hammer it a bit...

regards

Mark

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Mark Kirkwood 2003-10-31 10:31:30 Re: slow query performance
Previous Message Teodor Sigaev 2003-10-31 09:15:45 Re: Tsearch2 indexing question....