Re: RAID controllers for Postgresql on large setups

From: PFC <lists(at)peufeu(dot)com>
To: "Francisco Reyes" <lists(at)stringsutils(dot)com>
Cc: "Pgsql performance" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: RAID controllers for Postgresql on large setups
Date: 2008-05-13 20:46:04
Message-ID: op.ua31a20rcigqcu@apollo13.peufeu.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

>
>> You say that like you don't mind having PCI in a server whose job is
>> to perform massive query over large data sets.
>
> I am in my 4th week at a new job. Trying to figure what I am working
> with.

LOOL, ok, hehe, not exactly the time to have a "let's change everything"
fit ;)

> From what I see I will likely get as much improvement from new hardware
> as from re-doing some of the database design. Can't get everything done
> at once, not to mention I have to redo one machine sooner rather than
> later so I need to prioritize.
>
>> In fact for bulk IO a box with 2 SATA drives would be just as fast as
>> your monster RAID, lol.
>
> I am working on setting up a standard test based on the type of
> operations that the company does. This will give me a beter idea.
> Specially I will work with the developers to make sure the queries I
> create for the benchmark are representative of the workload.

watching vmstat (or iostat) while running a very big seq scan query will
give you information about the reading speed of your drives.
Same for writes, during one of your big updates, watch vmstat, you'll
know if you are CPU bound or IO bound...

- one core at 100% -> CPU bound
- lots of free CPU but lots of iowait -> disk bound
- disk throughput decent (in your setup, 100 MB/s) -> PCI bus saturation
- disk throughput miserable (< 10 MB/s) -> random IO bound (either random
reads or fsync() or random writes depending on the case)

> In your opinion if we get a new machine with PCI-e, at how many spindles
> will the SCSI random access superiority start to be less notable?
> Specially given the low number of connections we usually have running
> against these machines.

Sorting of random reads depends on multiple concurrent requests (which
you don't have). Sorting of random writes does not depend on concurrent
requests so, you'll benefit on your updates. About SCSI vs SATA vs number
of spindles : can't answer this one.

> We are using one single SQL statement.

OK, so forget about fsync penalty, but do tune your checkpoints so they
are not happening all the time... and bgwriter etc.

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message fernando castano 2008-05-14 04:40:40 can I move sort to first outer join ?
Previous Message James Mansion 2008-05-13 20:12:58 Re: RAID controllers for Postgresql on large setups