From: | Toby Corkindale <toby(dot)corkindale(at)strategicdata(dot)com(dot)au> |
---|---|
To: | mark <dvlhntr(at)gmail(dot)com> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: Fwd: Re: SSDD reliability |
Date: | 2011-05-19 01:10:12 |
Message-ID: | 4DD46DF4.5090908@strategicdata.com.au |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On 19/05/11 10:50, mark wrote:
>> Note 1:
>> I have seen an array that was powered on continuously for about six
>> years, which killed half the disks when it was finally powered down,
>> left to cool for a few hours, then started up again.
>>
>
>
> Recently we rebooted about 6 machines that had uptimes of 950+ days.
> Last time fsck had run on the file systems was 2006.
>
> When stuff gets that old, has been on-line and under heavy load all that
> time you actually get paranoid about reboots. In my newly reaffirmed
> opinion, at that stage reboots are at best a crap shoot. We lost several
> hours to that gamble more than we had budgeted for. HP is getting more of
> their gear back than in a usual month.
I worked at one place, years ago, which had an odd policy.. They had
automated hard resets hit all their servers on a Friday night, every week.
I thought they were mad at the time!
But.. it does mean that people design and test the systems so that they
can survive unattended resets reliably. (No one wants to get a support
call at 11pm on Friday because their server didn't come back up.)
It still seems a bit messed up though - even if friday night is a
low-use period, it still means causing a small amount of disruption to
customers - especially if a developer or sysadmin messed up, and a
server *doesn't* come back up.
From | Date | Subject | |
---|---|---|---|
Next Message | Craig Ringer | 2011-05-19 02:00:11 | Re: Using libpq with Visual Studio 2008 |
Previous Message | Martin Gainty | 2011-05-19 00:57:19 | Re: SSDD reliability |