Re: Reliability with RAID 10 SSD and Streaming Replication

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Cuong Hoang <climbingrose(at)gmail(dot)com>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Reliability with RAID 10 SSD and Streaming Replication
Date: 2013-05-16 18:34:40
Message-ID: CAMkU=1xKCdJ0hdBqu66z2Fhf-WELoNZ+CZYKdD3wHCdO49yZ+A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Thu, May 16, 2013 at 7:46 AM, Cuong Hoang <climbingrose(at)gmail(dot)com> wrote:

> Hi all,
>
> Our application is heavy write and IO utilisation has been the problem for
> us for a while. We've decided to use RAID 10 of 4x500GB Samsung 840 Pro for
> the master server. I'm aware of write cache issue on SSDs in case of power
> loss. However, our hosting provider doesn't offer any other choices of SSD
> drives with supercapacitor. To minimise risk, we will also set up another
> RAID 10 SAS in streaming replication mode. For our application, a few
> seconds of data loss is acceptable.
>
> My question is, would corrupted data files on the primary server affect
> the streaming standby? In other word, is this setup acceptable in terms of
> minimising deficiency of SSDs?
>

That seems rather scary to me for two reasons.

If the data center has a sudden power failure, why would it not take out
both machines either simultaneously or in short succession? Can you verify
that the hosting provider does not have them on the same UPS (or even
worse, as two virtual machines on the same physical host)?

The other issue is that you'd have to make sure the master does not restart
after a crash. If your init.d scripts just blindly start postgresql, then
after a sudden OS restart it will automatically enter recovery and then
open as usual, even though it might be silently corrupt. At that point it
will be generating WAL based on corrupt data (and incorrect query results),
and propagating that to the standby. So you have to be paranoid that if
the master ever crashes, it is shot in the head and then reconstructed from
the standby.

Cheers,

Jeff

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Merlin Moncure 2013-05-16 18:46:02 Re: Reliability with RAID 10 SSD and Streaming Replication
Previous Message Merlin Moncure 2013-05-16 17:31:48 Re: Reliability with RAID 10 SSD and Streaming Replication