Disk performance, was Re: tablespaces and DB administration

From: Marty Scholes <marty(at)outputservices(dot)com>
To: pgsql-performance(at)postgresql(dot)org, pgsql(at)mohawksoft(dot)com, pgadmin(at)pse-consulting(dot)de
Cc: marty(at)outputservices(dot)com
Subject: Disk performance, was Re: tablespaces and DB administration
Date: 2004-06-02 00:08:34
Message-ID: 40BD1A82.7010001@outputservices.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

This was a lively debate on what was faster, single spindles or RAID.

This is important, because I keep running into people who do not
understand the performance dynamics of a RDBMS like Oracle or Pg.

Pg and Oracle make a zillion tiny reads and writes and fsync()
regularly. If your drive will copy a 4GB file at a sustained rate of 72
MB/s, that tells you nothing about how it will do with an RDBMS.

I will throw in my experience on RAID vs spindles.

With the RAID write cache disabled, a well balanced set of spindles will
kill a RAID system any day.

Enable the cache, and the RAID starts inching ahead. My experience is
that no one can continuously keep I/O properly balanced across several
spindles on a production system. Things change and the I/O mix changes.
Then, the RAID is outperforming the spindles. If you want to spend
the rest of your career constantly balancing I/O across spindles, then
do so.

For the rest of us, with a write cache, a hardware RAID wins hands down
over the long haul.

It might make sense to provide some sort of benchmarking tool for
various systems so that we can predict I/O performance.

Run the following code both on a hardware RAID and on a single spindle.

#include <stdlib.h>
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <time.h>

void makefile(int fs)
{
int i;
char buf[8192];
int ld;
int blocks=4096;
int pos;
time_t stim;
time_t etim;
float avg;

unlink("dump.out");
ld=open("dump.out", O_WRONLY | O_CREAT);

printf("Writing %d blocks sequentially\n", blocks);
time(&stim);
for (i=0; i<blocks; i++) {
write(ld, buf, sizeof(buf));
if (fs) {
fsync(ld);
}
}
time(&etim);
avg = (blocks+0.0)/(etim-stim-0.0);
printf("Took %d seconds, avg %f iops\n\n", etim-stim, avg);

// purge the write cache
fsync(ld);

printf("Writing %d blocks (somewhat randomly)\n", blocks);
time(&stim);
for (i=0; i<blocks; i++) {
pos = (rand()%blocks)*sizeof(buf);
lseek(ld, pos, SEEK_SET);
write(ld, buf, sizeof(buf));
if (fs) {
fsync(ld);
}
}
time(&etim);
avg = (blocks+0.0)/(etim-stim-0.0);
printf("Took %d seconds, avg %f iops\n\n", etim-stim, avg);

close(ld);
unlink("dump.out");
}

int main()
{
printf("No fsync()\n");
makefile(0);

printf("With fsync()\n");
makefile(1);

return 0;
}

The first operation shows how well the OS write cache is doing. The
second shows how poorly everything runs with fsync(), which is what Pg
and Oracle do.

My RAID produced the following, but was also running production when I
ran it:

No fsync()
Writing 4096 blocks sequentially
Took 1 seconds, avg 4096.000000 iops

Writing 4096 blocks (somewhat randomly)
Took 4 seconds, avg 1024.000000 iops

With fsync()
Writing 4096 blocks sequentially
Took 40 seconds, avg 102.400002 iops

Writing 4096 blocks (somewhat randomly)
Took 66 seconds, avg 62.060608 iops

When I ran this on a decent fibre channel drive, I got:

No fsync()
Writing 4096 blocks sequentially
Took 1 seconds, avg 4096.000000 iops

Writing 4096 blocks (somewhat randomly)
Took 7 seconds, avg 585.142883 iops

With fsync()
Writing 4096 blocks sequentially
Took 106 seconds, avg 38.641510 iops

Writing 4096 blocks (somewhat randomly)
Took 115 seconds, avg 35.617390 iops

You can see that the RAID array really helps out with small writes.

Browse pgsql-performance by date

  From Date Subject
Next Message V i s h a l Kashyap @ [Sai Hertz And Control Systems] 2004-06-02 03:16:59 PostgreSQL and Kernel 2.6.x
Previous Message Vitaly Belman 2004-06-01 22:56:07 PostgreSQL on VMWare vs Windows vs CoLinux