From: | Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com> |
---|---|
To: | Greg Smith <greg(at)2ndquadrant(dot)com> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: SSDD reliability |
Date: | 2011-05-05 20:39:32 |
Message-ID: | BANLkTimDpNW_4DuCj8Lto_=Euh3NueFAiA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Thu, May 5, 2011 at 1:54 PM, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:
> I think your faith in PC component manufacturing is out of touch with the
> actual field failure rates for this stuff, which is produced with enormous
> cost cutting pressure driving tolerances to the bleeding edge in many cases.
> The equipment of the 80's and 90's you were referring to ran slower, and
> was more expensive so better quality components could be justified. The
> quality trend at the board and component level has been trending for a long
> time toward cheap over good in almost every case nowadays.
Modern CASE tools make this more and more of an issue. You can be in
a circuit design program, right click on a component and pick from a
dozen other components with lower tolerances and get a SPICE
simulation that says initial production line failure rates will go
from 0.01% to 0.02%. Multiply that times 100 components and it seems
like a small change. But all it takes is one misstep and you've got a
board with a theoretical production line failure rate of 0.05 that's
really 0.08, and the first year failure rate goes from 0.5% to 2 or 3%
and the $2.00 you saved on all components on the board times 1M units
goes right out the window.
BTW, the common term we used to refer to things that fail due to weird
and unforseen circumstances were often referred to as P.O.M.
dependent, (phase of the moon) because they'd often cluster around
certain operating conditions that were unobvious until you collected
and collated a large enough data set. Like hard drives that have
abnormally high failure rates at altitudes above 4500ft etc. Seem
fine til you order 1,000 for your Denver data center and they all
start failing. It could be anything like that. SSDs that operate
fine until they're in an environment with constant % humidity below
15% and boom they start failing like mad. It's impossible to test for
all conditions in the field, and it's quite possible that
environmental factors affect some of these SSDs we've heard about.
More research is necessary to say why someone would see such
clustering though.
From | Date | Subject | |
---|---|---|---|
Next Message | Merlin Moncure | 2011-05-05 21:19:01 | Re: multiple sequence number for one column |
Previous Message | Rick Genter | 2011-05-05 20:26:50 | Re: Multiple table relationship constraints |