| From: | Gregory Stark <stark(at)enterprisedb(dot)com> | 
|---|---|
| To: | "Matthew" <matthew(at)flymine(dot)org> | 
| Cc: | "Mark Mielke" <mark(at)mark(dot)mielke(dot)cc>, <pgsql-performance(at)postgresql(dot)org> | 
| Subject: | Re: RAID arrays and performance | 
| Date: | 2008-01-29 15:52:20 | 
| Message-ID: | 877ihsvdcb.fsf@oxford.xeocode.com | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-performance | 
"Matthew" <matthew(at)flymine(dot)org> writes:
> On Tue, 29 Jan 2008, Gregory Stark wrote:
>>> This was with 8192 random requests of size 8192 bytes from an 80GB test file.
>>> Unsorted requests ranged from 1.8 MB/s with no prefetching to 28MB/s with lots
>>> of prefetching. Sorted requests went from 2.4MB/s to 38MB/s. That's almost
>>> exactly 16x improvement for both, and this is top of the line hardware.
>>
>> Neat. The curves look very similar to mine. I also like that with your
>> hardware the benefit maxes out at pretty much exactly where I had
>> mathematically predicted they would ((stripe size)^2 / 2).
>
> Why would that be the case? Does that mean that we can select a stripe size of
> 100GB and get massive performance improvements? Doesn't seem logical to me. To
> me, it maxes out at 16x speed because there are 16 discs.
Sorry, I meant "number of drives in the array" not number of bytes. So with 16
drives you would need approximately 128 random pending i/o operations to
expect all drives to be busy at all times.
I got this from a back-of-the-envelope calculation which now that I'm trying
to reproduce it seems to be wrong. Previously I thought it was n(n+1)/2 or
about n^2/2. So at 16 I would have expected about 128 pending i/o requests
before all the drives could be expected to be busy.
Now that I'm working it out more carefully I'm getting that the expected
number of pending i/o requests before all drives are busy is
 n + n/2 + n/3 + ... + n/n
which is actually n * H(n) which is approximated closely by n * log(n).
That would predict that 16 drives would actually max out at 44.4 pending i/o
requests. It would predict that my three-drive array would max out well below
that at 7.7 pending i/o requests. Empirically neither result seems to match
reality. Other factors must be dominating.
> Amusingly, there appears to be a spam filter preventing my message (with its
> image) getting through to the performance mailing list.
This has been plaguing us for a while. When we figure out who's misconfigured
system is doing it I expect they'll be banned from the internet for life!
-- 
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Ask me about EnterpriseDB's Slony Replication support!
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Kevin Grittner | 2008-01-29 16:23:22 | Re: RAID arrays and performance | 
| Previous Message | Dmitry Potapov | 2008-01-29 15:25:55 | Re: planner chooses unoptimal plan on joins with complex key |