Re: Fwd: Re: SSDD reliability

From: Toby Corkindale <toby(dot)corkindale(at)strategicdata(dot)com(dot)au>
To: mark <dvlhntr(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Fwd: Re: SSDD reliability
Date: 2011-05-19 01:10:12
Message-ID: 4DD46DF4.5090908@strategicdata.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 19/05/11 10:50, mark wrote:
>> Note 1:
>> I have seen an array that was powered on continuously for about six
>> years, which killed half the disks when it was finally powered down,
>> left to cool for a few hours, then started up again.
>>
>
>
> Recently we rebooted about 6 machines that had uptimes of 950+ days.
> Last time fsck had run on the file systems was 2006.
>
> When stuff gets that old, has been on-line and under heavy load all that
> time you actually get paranoid about reboots. In my newly reaffirmed
> opinion, at that stage reboots are at best a crap shoot. We lost several
> hours to that gamble more than we had budgeted for. HP is getting more of
> their gear back than in a usual month.

I worked at one place, years ago, which had an odd policy.. They had
automated hard resets hit all their servers on a Friday night, every week.
I thought they were mad at the time!

But.. it does mean that people design and test the systems so that they
can survive unattended resets reliably. (No one wants to get a support
call at 11pm on Friday because their server didn't come back up.)

It still seems a bit messed up though - even if friday night is a
low-use period, it still means causing a small amount of disruption to
customers - especially if a developer or sysadmin messed up, and a
server *doesn't* come back up.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Craig Ringer 2011-05-19 02:00:11 Re: Using libpq with Visual Studio 2008
Previous Message Martin Gainty 2011-05-19 00:57:19 Re: SSDD reliability