Re: RAID arrays and performance

From: Mark Mielke <mark(at)mark(dot)mielke(dot)cc>
To: Matthew <matthew(at)flymine(dot)org>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: RAID arrays and performance
Date: 2007-12-04 13:45:24
Message-ID: 475559F4.7060104@mark.mielke.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Matthew wrote:
> On Tue, 4 Dec 2007, Gregory Stark wrote:
>
>> "Matthew" <matthew(at)flymine(dot)org> writes
>>> Does Postgres issue requests to each random access in turn, waiting for
>>> each one to complete before issuing the next request (in which case the
>>> performance will not exceed that of a single disc), or does it use some
>>> clever asynchronous access method to send a queue of random access
>>> requests to the OS that can be distributed among the available discs?
>>>
>> Sorry, it does the former, at least currently.
>> That said, this doesn't really come up nearly as often as you might think.
>>
> Shame. It comes up a *lot* in my project. A while ago we converted a task
> that processes a queue of objects to processing groups of a thousand
> objects, which sped up the process considerably. So we run an awful lot of
> queries with IN lists with a thousand values. They hit the indexes, then
> fetch the rows by random access. A full table sequential scan would take
> much longer. It'd be awfully nice to have those queries go twelve times
> faster.
>
The bitmap scan method does ordered reads of the table, which can
partially take advantage of sequential reads. Not sure whether bitmap
scan is optimal for your situation or whether your situation would allow
this to be taken advantage of.

>> Normally queries fit mostly in either the large batch query domain or the
>> small quick oltp query domain. For the former Postgres tries quite hard to do
>> sequential i/o which the OS will do readahead for and you'll get good
>> performance. For the latter you're normally running many simultaneous such
>> queries and the raid array helps quite a bit.
>>
> Having twelve discs will certainly improve the sequential IO throughput!
>
> However, if this was implemented (and I have *no* idea whatsoever how hard
> it would be), then large index scans would scale with the number of discs
> in the system, which would be quite a win, I would imagine. Large index
> scans can't be that rare!
>
Do you know that there is a problem, or are you speculating about one? I
think your case would be far more compelling if you could show a
problem. :-)

I would think that at a minimum, having 12 disks with RAID 0 or RAID 1+0
would allow your insane queries to run concurrent with up to 12 other
queries. Unless your insane query is the only query in use on the
system, I think you may be speculating about a nearly non-existence
problem. Just a suggestion...

I recall talk of more intelligent table scanning algorithms, and the use
of asynchronous I/O to benefit from RAID arrays, but the numbers
prepared to convince people that the change would have effect have been
less than impressive.

Cheers,
mark

--
Mark Mielke <mark(at)mielke(dot)cc>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Matthew 2007-12-04 14:11:25 Re: RAID arrays and performance
Previous Message Matthew 2007-12-04 13:24:35 Re: Utilizing multiple cores for one query