Re: New server: SSD/RAID recommendations?

From: "Mkrtchyan, Tigran" <tigran(dot)mkrtchyan(at)desy(dot)de>
To: "Graeme B(dot) Bell" <graeme(dot)bell(at)nibio(dot)no>
Cc: Steve Crawford <scrawford(at)pinpointresearch(dot)com>, "Wes Vaske (wvaske)" <wvaske(at)micron(dot)com>, pgsql-performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: New server: SSD/RAID recommendations?
Date: 2015-07-07 10:56:53
Message-ID: 1052663055.3807152.1436266613895.JavaMail.zimbra@desy.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

----- Original Message -----
> From: "Graeme B. Bell" <graeme(dot)bell(at)nibio(dot)no>
> To: "Mkrtchyan, Tigran" <tigran(dot)mkrtchyan(at)desy(dot)de>
> Cc: "Graeme B. Bell" <graeme(dot)bell(at)nibio(dot)no>, "Steve Crawford" <scrawford(at)pinpointresearch(dot)com>, "Wes Vaske (wvaske)"
> <wvaske(at)micron(dot)com>, "pgsql-performance" <pgsql-performance(at)postgresql(dot)org>
> Sent: Tuesday, July 7, 2015 12:38:10 PM
> Subject: Re: [PERFORM] New server: SSD/RAID recommendations?

> I am unsure about the performance side but, ZFS is generally very attractive to
> me.
>
> Key advantages:
>
> 1) Checksumming and automatic fixing-of-broken-things on every file (not just
> postgres pages, but your scripts, O/S, program files).
> 2) Built-in lightweight compression (doesn't help with TOAST tables, in fact
> may slow them down, but helpful for other things). This may actually be a net
> negative for pg so maybe turn it off.
> 3) ZRAID mirroring or ZRAID5/6. If you have trouble persuading someone that it's
> safe to replace a RAID array with a single drive... you can use a couple of
> NVMe SSDs with ZFS mirror or zraid, and get the same availability you'd get
> from a RAID controller. Slightly better, arguably, since they claim to have
> fixed the raid write-hole problem.
> 4) filesystem snapshotting
>
> Despite the costs of checksumming etc., I suspect ZRAID running on a fast CPU
> with multiple NVMe drives will outperform quite a lot of the alternatives, with
> great data integrity guarantees.

We are planing to have a test setup as well. For now I have single NVMe SSD on my
test system:

# lspci | grep NVM
85:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller 171X (rev 03)

# mount | grep nvm
/dev/nvme0n1p1 on /var/lib/pgsql/9.5 type ext4 (rw,noatime,nodiratime,data=ordered)

and quite happy with it. We have write heavy workload on it to see when it will
break. Postgres Performs very well. About x2.5 faster than with regular disks
with a single client and almost linear with multiple clients (picture attached.
On Y number of high level op/s our application does, X number of clients). The
setup is used last 3 months. Looks promising but for production we need to
to have disk size twice as big as on the test system. Until today, I was
planning to use a RAID10 with a HW controller...

Related to ZFS. We use ZFSonlinux and behaviour is not as good as with solaris.
Let's re-phrase it: performance is unpredictable. We run READZ2 with 30x3TB disks.

Tigran.

>
> Haven't built one yet. Hope to, later this year. Steve, I would love to know
> more about how you're getting on with your NVMe disk in postgres!
>
> Graeme.
>
> On 07 Jul 2015, at 12:28, Mkrtchyan, Tigran <tigran(dot)mkrtchyan(at)desy(dot)de> wrote:
>
>> Thanks for the Info.
>>
>> So if RAID controllers are not an option, what one should use to build
>> big databases? LVM with xfs? BtrFs? Zfs?
>>
>> Tigran.
>>
>> ----- Original Message -----
>>> From: "Graeme B. Bell" <graeme(dot)bell(at)nibio(dot)no>
>>> To: "Steve Crawford" <scrawford(at)pinpointresearch(dot)com>
>>> Cc: "Wes Vaske (wvaske)" <wvaske(at)micron(dot)com>, "pgsql-performance"
>>> <pgsql-performance(at)postgresql(dot)org>
>>> Sent: Tuesday, July 7, 2015 12:22:00 PM
>>> Subject: Re: [PERFORM] New server: SSD/RAID recommendations?
>>
>>> Completely agree with Steve.
>>>
>>> 1. Intel NVMe looks like the best bet if you have modern enough hardware for
>>> NVMe. Otherwise e.g. S3700 mentioned elsewhere.
>>>
>>> 2. RAID controllers.
>>>
>>> We have e.g. 10-12 of these here and e.g. 25-30 SSDs, among various machines.
>>> This might give people idea about where the risk lies in the path from disk to
>>> CPU.
>>>
>>> We've had 2 RAID card failures in the last 12 months that nuked the array with
>>> days of downtime, and 2 problems with batteries suddenly becoming useless or
>>> suddenly reporting wildly varying temperatures/overheating. There may have been
>>> other RAID problems I don't know about.
>>>
>>> Our IT dept were replacing Seagate HDDs last year at a rate of 2-3 per week (I
>>> guess they have 100-200 disks?). We also have about 25-30 Hitachi/HGST HDDs.
>>>
>>> So by my estimates:
>>> 30% annual problem rate with RAID controllers
>>> 30-50% failure rate with Seagate HDDs (backblaze saw similar results)
>>> 0% failure rate with HGST HDDs.
>>> 0% failure in our SSDs. (to be fair, our one samsung SSD apparently has a bug
>>> in TRIM under linux, which I'll need to investigate to see if we have been
>>> affected by).
>>>
>>> also, RAID controllers aren't free - not just the money but also the management
>>> of them (ever tried writing a complex install script that interacts work with
>>> MegaCLI? It can be done but it's not much fun.). Just take a look at the
>>> MegaCLI manual and ask yourself... is this even worth it (if you have a good
>>> MTBF on an enterprise SSD).
>>>
>>> RAID was meant to be about ensuring availability of data. I have trouble
>>> believing that these days....
>>>
>>> Graeme Bell
>>>
>>>
>>> On 06 Jul 2015, at 18:56, Steve Crawford <scrawford(at)pinpointresearch(dot)com> wrote:
>>>
>>>>
>>>> 2. We don't typically have redundant electronic components in our servers. Sure,
>>>> we have dual power supplies and dual NICs (though generally to handle external
>>>> failures) and ECC-RAM but no hot-backup CPU or redundant RAM banks and...no
>>>> backup RAID card. Intel Enterprise SSD already have power-fail protection so I
>>>> don't need a RAID card to give me BBU. Given the MTBF of good enterprise SSD
>>>> I'm left to wonder if placing a RAID card in front merely adds a new point of
>>>> failure and scheduled-downtime-inducing hands-on maintenance (I'm looking at
>>>> you, RAID backup battery).
>>>
>>>
>>>
>>> --
>>> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
>>> To make changes to your subscription:
>>> http://www.postgresql.org/mailpref/pgsql-performance
>
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance

Attachment Content-Type Size
image/png 4.8 KB

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Karl Denninger 2015-07-07 11:28:24 Re: New server: SSD/RAID recommendations?
Previous Message Graeme B. Bell 2015-07-07 10:38:10 Re: New server: SSD/RAID recommendations?