Re: Reliability with RAID 10 SSD and Streaming Replication

From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Andrea Suisani <sickpig(at)opinioni(dot)net>
Cc: Greg Smith <greg(at)2ndquadrant(dot)com>, Tomas Vondra <tv(at)fuzzy(dot)cz>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Reliability with RAID 10 SSD and Streaming Replication
Date: 2013-05-23 13:47:13
Message-ID: CAHyXU0wp-m3nD4S_+5_fuPRyK82QgOg=p-kh51wiLSk9_Mkqyw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Thu, May 23, 2013 at 1:56 AM, Andrea Suisani <sickpig(at)opinioni(dot)net> wrote:
> On 05/22/2013 03:30 PM, Merlin Moncure wrote:
>>
>> On Tue, May 21, 2013 at 7:19 PM, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:
>>>
>>> On 5/20/13 6:32 PM, Merlin Moncure wrote:
>
>
> [cut]
>
>
>>> The only really huge gain to be had using SSD is commit rate at a low
>>> client
>>> count. There you can easily do 5,000/second instead of a spinning disk
>>> that
>>> is closer to 100, for less than what the battery-backed RAID card along
>>> costs to speed up mechanical drives. My test server's 100GB DC S3700 was
>>> $250. That's still not two orders of magnitude faster though.
>>
>>
>> That's most certainly *not* the only gain to be had: random read rates
>> of large databases (a very important metric for data analysis) can
>> easily hit 20k tps. So I'll stand by the figure. Another point: that
>> 5000k commit raid is sustained, whereas a raid card will spectacularly
>> degrade until the cache overflows; it's not fair to compare burst with
>> sustained performance. To hit 5000k sustained commit rate along with
>> good random read performance, you'd need a very expensive storage
>> system. Right now I'm working (not by choice) with a teir-1 storage
>> system (let's just say it rhymes with 'weefax') and I would trade it
>> for direct attached SSD in a heartbeat.
>>
>> Also, note that 3rd party benchmarking is showing the 3700 completely
>> smoking the 710 in database workloads (for example, see
>> http://www.anandtech.com/show/6433/intel-ssd-dc-s3700-200gb-review/6).
>
>
> [cut]
>
> Sorry for interrupting but on a related note I would like to know your
> opinions on what the anandtech review said about 3700 poor performance
> on "Oracle Swingbench", quoting the relevant part that you can find here (*)
>
> <quote>
>
> [..] There are two components to the Swingbench test we're running here:
> the database itself, and the redo log. The redo log stores all changes that
> are made to the database, which allows the database to be reconstructed in
> the event of a failure. In good DB design, these two would exist on separate
> storage systems, but in order to increase IO we combined them both for this
> test.
> Accesses to the DB end up being 8KB and random in nature, a definite strong
> suit
> of the S3700 as we've already shown. The redo log however consists of a
> bunch
> of 1KB - 1.5KB, QD1, sequential accesses. The S3700, like many of the newer
> controllers we've tested, isn't optimized for low queue depth, sub-4KB,
> sequential
> workloads like this. [..]
>
> </quote>
>
> Does this kind of scenario apply to postgresql wal files repo ?

huh -- I don't think so. wal file segments are 8kb aligned, ditto
clog, etc. In XLogWrite():

/* OK to write the page(s) */
from = XLogCtl->pages + startidx * (Size) XLOG_BLCKSZ;
nbytes = npages * (Size) XLOG_BLCKSZ; <--
errno = 0;
if (write(openLogFile, from, nbytes) != nbytes)
{

AFICT, that's the only way you write out xlog. One thing I would
definitely advise though is to disable partial page writes if it's
enabled. s3700 is algined on 8kb blocks internally -- hm.

merlin

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Andrea Suisani 2013-05-23 13:55:08 Re: Reliability with RAID 10 SSD and Streaming Replication
Previous Message Sachin D. Bhosale-Kotwal 2013-05-23 11:39:32 pgbench: spike in pgbench results(graphs) while testing pg_hint_plan performance