Re: Suggestions for a HBA controller (6 x SSDs + madam RAID10)

From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Pietro Pugni <pietro(dot)pugni(at)gmail(dot)com>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Suggestions for a HBA controller (6 x SSDs + madam RAID10)
Date: 2017-02-21 15:04:34
Message-ID: CAHyXU0x4cbFus3cJ+oTi5_FgWHc7ChWM2jtHDBYBwivfYXVppA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Tue, Feb 21, 2017 at 7:49 AM, Pietro Pugni <pietro(dot)pugni(at)gmail(dot)com>
wrote:

> Hi there,
> I configured an IBM X3650 M4 for development and testing purposes. It’s
> composed by:
> - 2 x Intel Xeon E5-2690 @ 2.90Ghz (2 x 8 physical Cores + HT)
> - 96GB RAM DDR3 1333MHz (12 x 8GB)
> - 2 x 146GB SAS HDDs @ 15k rpm configured in RAID1 (mdadm)
> - 6 x 525GB SATA SSDs (over-provisioned at 25%, so 393GB available)
>
> I’ve done a lot of testing focusing on 4k and 8k workloads and found that
> the IOPS of those SSDs are half the expected. On serverfault.com someone
> suggested me that probably the bottle neck is the embedded RAID controller,
> a IBM ServeRaid m5110e, which mounts a LSI 2008 controller.
>
> I’m using the disks in JBOD mode with mdadm software RAID, which is
> blazing fast. The CPU is also very fast, so I don’t mind having a little
> overhead due to software RAID.
>
> My typical workload is Postgres run as a DWH with 1 to 2 billions of rows,
> big indexes, partitions and so on, but also intensive statistical
> computations.
>
>
> Here’s my post on serverfault.com ( http://serverfault.com/
> questions/833642/slow-ssd-performance-ibm-x3650-m4-7915 )
> and here’s a graph of those six SSDs evaluated using fio as stand-alone
> disks (outside of the RAID):
>
> [image: https://i.stack.imgur.com/ZMhUJ.png]
>
> All those IOPS should be doubled if all was working correctly. The curve
> trend is correct for increasing IO Depths.
>
>
> Anyway, I would like to buy a HBA controller that leverages those 6 SSDs.
> Each SSD should deliver about 80k to 90k IOPS, so in RAID10 I should get
> ~240k IOPS (6 x 80k / 2) and in RAID0 ~480k IOPS (6 x 80k). I’ve seen that
> mdadm effectively scales performance, but the controller limits the overal
> IOPS at ~120k (exactly the half of the expected IOPS).
>
> *What HBA controller would you suggest me able to handle 500k IOPS? *
>
>
> My server is able to handle 8 more SSDs, for a total of 14 SSDs and 1260k
> theoretical IOPS. If we imagine adding only 2 more disks, I will achieve
> 720k theoretical IOPS in RAID0.
>
> *What HBA controller would you suggest me able to handle more than 700k
> IOPS? *
>
> *Have you got some advices about using mdadm RAID software on SATAIII SSDs
> and plain HBA?*
>

Random points/suggestions:
*) mdadm is the way to go. I think you'll get bandwidth constrained on
most modern hba unless they are really crappy. On reasonably modern
hardware storage is rarely the bottleneck anymore (which is a great place
to be). Fancy raid controllers may actually hurt performance -- they are
obsolete IMNSHO.

*) Small point, but you'll want to crank effective_io_concurrency (see:
https://www.postgresql.org/message-id/CAHyXU0yiVvfQAnR9cyH%3DHWh1WbLRsioe%3DmzRJTHwtr%3D2azsTdQ%40mail.gmail.com)
It only affects certain kinds of queries, but when it works it really
works. Those benchmarks were done on my crapbox dell workstation!

*) For very high transaction rates, you can get a lot of benefit from
disabling synchronous_commit if you are willing to accommodate the risk. I
do not recommend disabling fsync unless you are prepared to regenerate the
entire database at any time.

*) Don't assume indefinite linear scaling as you increase storage capacity
-- the database itself can become the bottleneck, especially for writing.
To improve write performance, classic optimization strategies of trying to
intelligently bundle writes around units of work still apply. If you are
expecting high rates of write activity your engineering focus needs to be
here for sure (read scaling is comparatively pretty easy).

*) I would start doing your benchmarking with pgbench since that is going
to most closely reflect measured production performance.

> My typical workload is Postgres run as a DWH with 1 to 2 billions of
rows, big indexes, partitions and so on, but also intensive statistical
computations.

If this is the case your stack performance is going to be based on data
structure design. Make liberal use of:
*) natural keys
*) constraint exclusion for partition selection
*) BRIN index is amazing (if you can work into it's limitations)
*) partial indexing
*) covering indexes. Don't forget to vacuum your partitions before you
make them live if you use them

If your data is going to get really big and/or query activity is expected
to be high, keep an eye on your scale out strategy. Going monolithic to
bootstrap your app is the right choice IMO but start thinking about the
longer term if you are expecting growth. I'm starting to come out to the
perspective that lift/shift scaleout using postgres fdw without an insane
amount of app retooling could be a viable option by postgres 11/12 or so.
For my part I scaled out over asynchronous dblink which is a much more
maintenance heavy strategy (but works fabulous although I which you could
asynchronously connect).

merlin

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Wes Vaske (wvaske) 2017-02-21 19:40:03 Re: Suggestions for a HBA controller (6 x SSDs + madam RAID10)
Previous Message John Gorman 2017-02-21 14:06:37 Re: Correct use of cursors for very large result sets in Postgres