Re: Reliability with RAID 10 SSD and Streaming Replication

From: Tomas Vondra <tv(at)fuzzy(dot)cz>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Reliability with RAID 10 SSD and Streaming Replication
Date: 2013-05-19 22:34:50
Message-ID: 5199538A.3020707@fuzzy.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Do you really need a running standby for fast failover? What about doing
plain WAL archiging? I'd definitely consider that, because even if you
setup a SAS-based replica, you can't use it for production as it does no
handle the load.

I think you could setup WAL archiving and in case of crash just use the
base backup and replay the WAL from the archive.

This means the SAS-based system is purely for WAL archiving, i.e.
performs only sequential writes which should not be a big deal.

The recovery will be performed on the SSD system, which should handle it
fine. If you need faster recovery, you may perform it incrementally on
the SAS system (it will take some time, but it won't influence the
master). You might do that daily or something like that.

The only problem with this is that this is file based, and could mean
lag (up to 16MB or archive_timeout). But this should not be problem if
you place the WAL on SAS drives with controller. If you use RAID, you
should be perfectly fine.

So this is what I'd suggest:

1) use SSD for data files, SAS RAID1 for WAL on the master
2) setup WAL archiving (base backup + archive on SAS system)
3) update the base backup daily (incremental recovery)
4) in case of crash, keep WAL from the archive and pg_xlog on the
SAS RAID (on master)

Tomas

On 17.5.2013 02:21, Cuong Hoang wrote:
> Hi Tomas,
>
> We have a lot of small updates and some inserts. The database size is at
> 35GB including indexes and TOAST. We think it will keep growing to about
> 200GB. We usually have a burst of about 500k writes in about 5-10
> minutes which basically cripples IO on the current servers. I've tried
> to increase the checkpoint_segments, checkpoint_timeout etc. as
> recommended in "PostgreSQL 9.0 Performance" book. However, it seems like
> our server just couldn't handle the current load.
>
> Here is the server specs:
>
> Dual E5620, 32GB RAM, 4x1TB SAS 15k in RAID10
>
> Here are some core PostgreSQL configs:
>
> shared_buffers = 2GB # min 128kB
> work_mem = 64MB # min 64kB
> maintenance_work_mem = 1GB # min 1MB
> wal_buffers = 16MB
> checkpoint_segments = 128
> checkpoint_timeout = 30min
> checkpoint_completion_target = 0.7
>
>
> Thanks,
> Cuong
>
>
> On Fri, May 17, 2013 at 10:06 AM, Tomas Vondra <tv(at)fuzzy(dot)cz
> <mailto:tv(at)fuzzy(dot)cz>> wrote:
>
> Hi,
>
> On 16.5.2013 16:46, Cuong Hoang wrote:
> > Hi all,
> >
> > Our application is heavy write and IO utilisation has been the problem
> > for us for a while. We've decided to use RAID 10 of 4x500GB
> Samsung 840
>
> What does "heavy write" mean in your case? Does that mean a lot of small
> transactions or few large ones?
>
> What have you done to tune the server?
>
> > Pro for the master server. I'm aware of write cache issue on SSDs in
> > case of power loss. However, our hosting provider doesn't offer any
> > other choices of SSD drives with supercapacitor. To minimise risk, we
> > will also set up another RAID 10 SAS in streaming replication
> mode. For
> > our application, a few seconds of data loss is acceptable.
>
> Streaming replication allows zero data loss if used in synchronous mode.
>
> > My question is, would corrupted data files on the primary server
> affect
> > the streaming standby? In other word, is this setup acceptable in
> terms
> > of minimising deficiency of SSDs?
>
> It should be.
>
> Have you considered using a UPS? That would make the SSDs about as
> reliable as SATA/SAS drives - the UPS may fail, but so may a BBU unit on
> the SAS controller.
>
> Tomas
>
>
> --
> Sent via pgsql-performance mailing list
> (pgsql-performance(at)postgresql(dot)org
> <mailto:pgsql-performance(at)postgresql(dot)org>)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Cuong Hoang 2013-05-20 02:41:32 Re: Reliability with RAID 10 SSD and Streaming Replication
Previous Message Tomas Vondra 2013-05-19 22:15:23 Re: Reliability with RAID 10 SSD and Streaming Replication