Re: Fwd: Re: SSDD reliability

From: Toby Corkindale <toby(dot)corkindale(at)strategicdata(dot)com(dot)au>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Fwd: Re: SSDD reliability
Date: 2011-05-06 02:02:12
Message-ID: 4DC356A4.8020004@strategicdata.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 05/05/11 18:36, Florian Weimer wrote:
> * Greg Smith:
>
>> Intel claims their Annual Failure Rate (AFR) on their SSDs in IT
>> deployments (not OEM ones) is 0.6%. Typical measured AFR rates for
>> mechanical drives is around 2% during their first year, spiking to 5%
>> afterwards. I suspect that Intel's numbers are actually much better
>> than the other manufacturers here, so a SSD from anyone else can
>> easily be less reliable than a regular hard drive still.
>
> I'm a bit concerned with usage-dependent failures. Presumably, two SDDs
> in a RAID-1 configuration are weared down in the same way, and it would
> be rather inconvenient if they failed at the same point. With hard
> disks, this doesn't seem to happen; even bad batches fail pretty much
> randomly.

Actually I think it'll be the same as with hard disks.
ie. A batch of drives with sequential serial numbers will have a fairly
similar average lifetime, but they won't pop their clogs all on the same
day. (Unless there is an outside influence - see note 1)

The wearing-out of SSDs is not as exact as people seem to think. If the
drive is rated for 10,000 erase cycles, then that is meant to be a
MINIMUM amount. So most blocks will get more than that amount, and maybe
a small number will die before that amount. I guess it's a probability
curve, engineered such that 95% or some other high percentage will
outlast that count. (and the SSDs have reserved blocks which are
introduced to take over from failing blocks, invisibly to the end-user
-since it can always read from the failing-to-erase block)

Note 1:
I have seen an array that was powered on continuously for about six
years, which killed half the disks when it was finally powered down,
left to cool for a few hours, then started up again.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Josh Kupershmidt 2011-05-06 02:28:01 Re: Queries Regarding Postgresql Replication
Previous Message Toby Corkindale 2011-05-06 01:50:13 SMART attributes for SSD (was: SSDD reliability)