2009/3/22 Greg Smith <gsmith(at)gregsmith(dot)com>:
> On Fri, 20 Mar 2009, M. Edward (Ed) Borasky wrote:
>
>> I just discovered this on a LinkedIn user group:
>> http://bugzilla.kernel.org/show_bug.cgi?id=12309
>
> I would bet there's at least 3 different bugs in that one. That bug report
> got a lot of press via Slashdot a few months ago, and it's picked all sort
> of people who all have I/O wait issues, but they don't all have the same
> cause. The 3ware-specific problem Laurent mentioned is an example. That's
> not the same thing most of the people there are running into, the typical
> reporter there has disks attached directly to their motherboard. The irony
> here is that #12309 was a fork of #7372 to start over with a clean
> discussion slat because the same thing happened to that earlier one.
That I/O wait problem is not 3ware specific. A friend of mine has the
same problem/fix with aacraid.
I'd bet a couple coins that controllers that show this problem do not set mwi.
quickly grepping linux sources (2.6.28.8) for pci_try_set_mwi:
(only disks controllers showed here)
230:pata_cs5530.c
3442:sata_mv.c
2016:3w-9xxx.c
147:qla_init.c
2412:lpfc_init.c
171:cs5530.c
>
> The original problem reported there showed up in 2.6.20, so I've been able
> to avoid this whole thing by sticking to the stock RHEL5 kernel (2.6.18) on
> most of the production systems I deal with. (Except for my system with an
> Areca card--that one needs 2.6.22 or later to be stable, and seems to have
> no unexpected I/O wait issues. I think this is because it's taking over the
> lowest level I/O scheduling from Linux, when it pushes from the card's cache
> onto the disks).
I thought about completely fair scheduler at first, but that one came
in around 2.6.21.
some tests were done with different I/O scheduler, and they do not
seem to be the real cause of I/O wait.
A bad interaction between hard raid cards cache and system willing the
card to write at the same time could be a reason.
unfortunately, I've met it with a now retired box at work, that was
running a single disk plugged on the mobo controller.
So, there's something else under the hood...but my (very) limited
kernel knowledge can't help more here.
>
> Some of the people there reported significant improvement by tuning the
> pdflush tunables; now that I've had to do a few times on systems to get rid
> of unexpected write lulls. I wrote up a walkthrough on one of them at
> http://notemagnet.blogspot.com/2008/08/linux-write-cache-mystery.html that
> goes over how to tell if you're running into that problem, and what to do
> about it; something else I wrote on that already made it into the bug report
> in comment #150.
I think that forcing the system to write down more often, and smaller
data just hides the problem, and doesn't correct it.
But well, that's just feeling, not science. I hope some real hacker
will be able to spot the problem(s) so they can be fixed.
anyway, I keep a couple coins on mwi as a source of problem :-)
Regards,
Laurent