Re: OT - 2 of 4 drives in a Raid10 array failed - Any chance of recovery?

From: Greg Smith <gsmith(at)gregsmith(dot)com>
To: Ow Mun Heng <ow(dot)mun(dot)heng(at)wdc(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: OT - 2 of 4 drives in a Raid10 array failed - Any chance of recovery?
Date: 2009-10-21 06:10:01
Message-ID: alpine.GSO.2.01.0910210155300.1418@westnet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Tue, 20 Oct 2009, Ow Mun Heng wrote:

> Raid10 is supposed to be able to withstand up to 2 drive failures if the
> failures are from different sides of the mirror. Right now, I'm not
> sure which drive belongs to which. How do I determine that? Does it
> depend on the output of /prod/mdstat and in that order?

You build a 4-disk RAID10 array on Linux by first building two RAID1
pairs, then striping both of the resulting /dev/mdX devices together via
RAID0. You'll actually have 3 /dev/mdX devices around as a result. I
suspect you're trying to execute mdadm operations on the outer RAID0, when
what you actually should be doing is fixing the bottom-level RAID1
volumes. Unfortunately I'm not too optimistic about your case though,
because if you had a repairable situation you technically shouldn't have
lost the array in the first place--it should still be running, just in
degraded mode on both underlying RAID1 halves.

There's a good example of how to set one of these up
http://www.sanitarium.net/golug/Linux_Software_RAID.html ; note how the
RAID10 involves /dev/md{0,1,2,3} for the 6-disk volume.

Here's what will probably show you the parts you're trying to figure out:

mdadm --detail /dev/md0
mdadm --detail /dev/md1
mdadm --detail /dev/md2

That should give you an idea what md devices are hanging around and what's
inside of them.

One thing you don't see there is what devices were originally around if
they've already failed. I highly recommend saving a copy of the mdadm
detail (and "smartctl -i" for each underlying drive) on any production
server, to make it easier to answer questions like "what's the serial
number of the drive that failed in /dev/md0?".

--
* Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Scott Marlowe 2009-10-21 06:25:29 Re: OT - 2 of 4 drives in a Raid10 array failed - Any chance of recovery?
Previous Message Tatsuo Ishii 2009-10-21 05:40:15 How much lines per day?