Re: Huge iowait during checkpoint finish

From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
Cc: Anton Belyaev <anton(dot)belyaev(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: Huge iowait during checkpoint finish
Date: 2010-01-11 21:59:11
Message-ID: 4B4B9F2F.4030504@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Scott Marlowe wrote:
> On Mon, Jan 11, 2010 at 3:53 AM, Anton Belyaev <anton(dot)belyaev(at)gmail(dot)com> wrote:
>
>> Old RAID-1 has "hardware" LSI controller.
>> I still have access to old server.
>>
>
> The old RAID card likely had a battery backed cache, which would make
> the fsyncs much faster, as long as you hadn't run out of cache.
>

To be super clear here: it's possible to see a 100:1 performance drop
going from a system with a battery-backed write cache to one that
doesn't. This one of the three main weak spots of software RAID that
still keeps hardware RAID vendors in business: it can't do anything to
speed up the type of writes done during transactions commit and at
checkpoint time. (The others are that it's hard to setup transparent
failover after failure in software RAID so that it always works at boot
time, and that motherboard chipsets can easily lose their minds and take
down the whole system when one drive goes bad).

> If you can shoehorn one more drive, you could run RAID-10 and get much
> better performance.
>
And throwing drives at the problem may not help. I've see a system with
a 48 disk software RAID-10 that only got 100 TPS when running a
commit-heavy test, because it didn't have any way to cache writes
usefully for that purpose.

If the old system had a write caching card, and the new one doesn't,
that's certainly your most likely suspect for the source of the
slowdown. As for testing that specifically, if you have the old system
too you can look at the slides I've got for "Database Hardware
Benchmarking" at
http://www.westnet.com/~gsmith/content/postgresql/index.htm and use the
sysbench example I show on P26 to measure commit fsync rate. There's a
video of that presentation where I explain a lot of the background in
this area too.

--
Greg Smith 2ndQuadrant Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com www.2ndQuadrant.com

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Scott Marlowe 2010-01-11 23:01:00 Re: Huge iowait during checkpoint finish
Previous Message Andy Colson 2010-01-11 21:18:20 Re: migration: parameterized statement and cursor