Stuck LSI 9650SE-12 RAID Controller

From: Craig James <cjames(at)emolecules(dot)com>
To: "pgsql-admin(at)postgresql(dot)org" <pgsql-admin(at)postgresql(dot)org>
Subject: Stuck LSI 9650SE-12 RAID Controller
Date: 2014-08-05 16:00:30
Message-ID: CAFwQ8rfsBZUSVEyhyZinBDJOQRkCjveVhLiKmHUkQEh2K+61nQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Has anyone seen anything like this?

Our LSI 9650SE-12 RAID Controller dropped the main Postgres disk offline
... it just disappeared as though the disk wasn't there. It was an 8-disk
RAID10 unit. The other unit (RAID1 for Linux & pg_xlog) was
still functional.

Using tw_cli, it showed the array as "DEGRADED" and claimed to be verifying
it. One disk in the array was "DEGRADED". There was no /dev entry for the
device; Linux couldn't see it at all.

There were two hot spares, but it didn't use them. Worse, there was nothing
I could do to make it do anything. Every command reported "Failed" and no
further explanation. Booting into the RAID BIOS gave the same problem: if I
selected "rebuild" or "verify", it said "You must select an array..." even
though I had selected the array. It was as though the array didn't exist,
yet it was shown.

I shut off the computer, unplugged the BBU from the RAID card and plugged
it back in, unplugged and reinserted all the SATA cables, and then
restarted. Exact same symptoms.

I finally gave up trying to recover the database (we had a backup server).
The RAID controller let me delete and recreate the degraded array, and now
everything seems fine. I can rebuild the Postgres database on the new unit.
But I've lost a HUGE amount of trust in the LSI 9650-SE RAID controller
card.

Thanks,
Craig

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message jayknowsunix 2014-08-05 16:09:21 Re: Stuck LSI 9650SE-12 RAID Controller
Previous Message Kevin Grittner 2014-08-05 15:55:13 Re: upgrading postgres 7.3.4 to 9.1.9