Re: Reliability with RAID 10 SSD and Streaming Replication

From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, sthomas(at)optionshouse(dot)com, Tomas Vondra <tv(at)fuzzy(dot)cz>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Reliability with RAID 10 SSD and Streaming Replication
Date: 2013-05-22 19:51:45
Message-ID: CAHyXU0wUHDiGVWgejxsFNApiP0sUcf8eLOy1Dwy-1pboBuxLeQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Wed, May 22, 2013 at 2:30 PM, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:
> On 5/22/13 3:06 PM, Joshua D. Drake wrote:
>>
>> Greg, can you elaborate on the SSD + Xlog issue? What type of burn
>> through are we talking about?
>
>
> You're burning through flash cells at a multiple of the total WAL write
> volume. The system I gave iostat snapshots from upthread (with the Intel
> 710 hitting its limit) archives about 1TB of WAL each week. The actual
> amount of WAL written in terms of erased flash blocks is even higher though,
> because sometimes the flash is hit with partial page writes. The write
> amplification of WAL is much worse than the main database.
>
> I gave a rough intro to this on the Intel drives at
> http://blog.2ndquadrant.com/intel_ssds_lifetime_and_the_32/ and there's a
> nice "Write endurance" table at
> http://www.tomshardware.com/reviews/ssd-710-enterprise-x25-e,3038-2.html
>
> The cheapest of the Intel SSDs I have here only guarantees 15TB of total
> write endurance. Eliminating >1TB of writes per week by moving the WAL off
> SSD is a pretty significant change, even though the burn rate isn't a simple
> linear thing--you won't burn the flash out in only 15 weeks.

Certainly, intel 320 is not designed for 1tb/week workloads.

> The production server is actually using the higher grade 710 drives that aim
> for 900TB instead. But I do have standby servers using the low grade stuff,
> so anything I can do to decrease SSD burn rate without dropping performance
> is useful. And only the top tier of transaction rates will outrun a RAID1
> pair of 15K drives dedicated to WAL.

s3700 is rated for 10 drive writes/day for 5 years. so, for 200gb drive, that's
200gb * 10/day * 365 days * 5, that's 3.65 million gigabytes or ~ 3.5 petabytes.

1tb/week would take 67 years to burn through / whatever you assume for
write amplification / whatever extra penalty you give if you are
shooting for > 5 year duty cycle (flash degrades faster the older it
is) *for a single 200gb device*. write endurance is not a problem
for this drive, in fact it's a very reasonable assumption that the
faster worst case random performance is directly related to reduced
write amplification. btw, cost/pb of this drive is less than half of
the 710 (which IMO was obsolete the day the s3700 hit the street).

merlin

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Shaun Thomas 2013-05-22 20:01:31 Re: Reliability with RAID 10 SSD and Streaming Replication
Previous Message Greg Smith 2013-05-22 19:30:30 Re: Reliability with RAID 10 SSD and Streaming Replication