Re: SSD + RAID

From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com>, pgsql-performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: SSD + RAID
Date: 2010-03-02 06:13:29
Message-ID: 4B8CAC89.1020100@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Bruce Momjian wrote:
> I always assumed SCSI disks had a write-through cache and therefore
> didn't need a drive cache flush comment.
>

There's more detail on all this mess at
http://wiki.postgresql.org/wiki/SCSI_vs._IDE/SATA_Disks and it includes
this perception, which I've recently come to believe isn't actually
correct anymore. Like the IDE crowd, it looks like one day somebody
said "hey, we lose every write heavy benchmark badly because we only
have a write-through cache", and that principle got lost along the
wayside. What has been true, and I'm staring to think this is what
we've all been observing rather than a write-through cache, is that the
proper cache flushing commands have been there in working form for so
much longer that it's more likely your SCSI driver and drive do the
right thing if the filesystem asks them to. SCSI SYNCHRONIZE CACHE has
a much longer and prouder history than IDE's FLUSH_CACHE and SATA's
FLUSH_CACHE_EXT.

It's also worth noting that many current SAS drives, the current SCSI
incarnation, are basically SATA drives with a bridge chipset stuck onto
them, or with just the interface board swapped out. This one reason why
top-end SAS capacities lag behind consumer SATA drives. They use the
consumers as beta testers to get the really fundamental firmware issues
sorted out, and once things are stable they start stamping out the
version with the SAS interface instead. (Note that there's a parallel
manufacturing approach that makes much smaller SAS drives, the 2.5"
server models or those at higher RPMs, that doesn't go through this
path. Those are also the really expensive models, due to economy of
scale issues). The idea that these would have fundamentally different
write cache behavior doesn't really follow from that development model.

At this point, there are only two common differences between "consumer"
and "enterprise" hard drives of the same size and RPM when there are
directly matching ones:

1) You might get SAS instead of SATA as the interface, which provides
the more mature command set I was talking about above--and therefore may
give you a sane write-back cache with proper flushing, which is all the
database really expects.

2) The timeouts when there's a read/write problem are tuned down in the
enterprise version, to be more compatible with RAID setups where you
want to push the drive off-line when this happens rather than presuming
you can fix it. Consumers would prefer that the drive spent a lot of
time doing heroics to try and save their sole copy of the apparently
missing data.

You might get a slightly higher grade of parts if you're lucky too; I
wouldn't count on it though. That seems to be saved for the high RPM or
smaller size drives only.

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com www.2ndQuadrant.us

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Pierre C 2010-03-02 08:36:48 Re: SSD + RAID
Previous Message Bruce Momjian 2010-03-02 03:34:57 Re: SSD + RAID