Re: RAID arrays and performance

From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Mark Mielke" <mark(at)mark(dot)mielke(dot)cc>
Cc: "Matthew" <matthew(at)flymine(dot)org>, <pgsql-performance(at)postgresql(dot)org>
Subject: Re: RAID arrays and performance
Date: 2007-12-04 14:53:43
Message-ID: 87fxyiik94.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance


"Mark Mielke" <mark(at)mark(dot)mielke(dot)cc> writes:

> Matthew wrote:
>
>> I don't think you would have to create a more intelligent table scanning
>> algorithm. What you would need to do is take the results of the index,
>> convert that to a list of page fetches, then pass that list to the OS as
>> an asynchronous "please fetch all these into the buffer cache" request,
>> then do the normal algorithm as is currently done. The requests would then
>> come out of the cache instead of from the disc. Of course, this is from a
>> simple Java programmer who doesn't know the OS interfaces for this sort of
>> thing.
>
> That's about how the talk went. :-)
>
> The problem is that a 12X speed for 12 disks seems unlikely except under very
> specific loads (such as a sequential scan of a single table). Each of the
> indexes may need to be scanned or searched in turn, then each of the tables
> would need to be scanned or searched in turn, depending on the query plan.
> There is no guarantee that the index rows or the table rows are equally spread
> across the 12 disks. CPU processing becomes involved with is currently limited
> to a single processor thread. I suspect no database would achieve a 12X speedup
> for 12 disks unless a simple sequential scan of a single table was required, in
> which case the reads could be fully parallelized with RAID 0 using standard
> sequential reads, and this is available today using built-in OS or disk
> read-ahead.

I'm sure you would get something between 1x and 12x though...

I'm rerunning my synthetic readahead tests now. That doesn't show the effect
of the other cpu and i/o work being done in the meantime but surely if they're
being evicted from cache too soon that just means your machine is starved for
cache and you should add more RAM?

Also, it's true, you need to preread more than 12 blocks to handle a 12-disk
raid. My offhand combinatorics analysis seems to indicate you would expect to
need to n(n-1)/2 blocks on average before you've hit all the blocks. There's
little penalty to prereading unless you use up too much kernel resources or
you do unnecessary i/o which you never use, so I would expect doing n^2 capped
at some reasonable number like 1,000 pages (enough to handle a 32-disk raid)
would be reasonable.

The real trick is avoiding doing prefetches that are never needed. The user
may never actually read all the tuples being requested. I think that means we
shouldn't prefetch until the second tuple is read and then gradually increase
the prefetch distance as you read more and more of the results.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's Slony Replication support!

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Matthew 2007-12-04 15:21:45 Re: RAID arrays and performance
Previous Message Mark Mielke 2007-12-04 14:30:36 Re: RAID arrays and performance