Re: Intel SSDs that may not suck

From: Greg Smith <greg(at)2ndQuadrant(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Intel SSDs that may not suck
Date: 2011-04-07 02:21:55
Message-ID: 4D9D1FC3.4020207@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Here's the new Intel 3rd generation 320 series drive:

$ sudo smartctl -i /dev/sdc
Device Model: INTEL SSDSA2CW120G3
Firmware Version: 4PC10302
User Capacity: 120,034,123,776 bytes
ATA Version is: 8
ATA Standard is: ATA-8-ACS revision 4

Since I have to go chant at the unbelievers next week (MySQL Con), don't
have time for a really thorough look here. But I made a first pass
through my usual benchmarks without any surprises.

bonnie++ meets expectations with 253MB/s reads, 147MB/s writes, and 3935
seeks/second:

Version 1.03e ------Sequential Output------ --Sequential Input-
--Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
--Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
/sec %CP
toy 32144M 147180 7 77644 3 253893 5
3935 15

Using sysbench to generate a 100GB file and randomly seek around it
gives a similar figure:

Extra file open flags: 0
100 files, 1Gb each
100Gb total file size
Block size 8Kb
Number of random requests for random IO: 10000
Read/Write ratio for combined random IO test: 1.50
Using synchronous I/O mode
Doing random read test
Threads started!
Done.

Operations performed: 10000 reads, 0 writes, 0 Other = 10000 Total
Read 78.125Mb Written 0b Total transferred 78.125Mb (26.698Mb/sec)
3417.37 Requests/sec executed

So that's the basic range of performance: up to 250MB/s on reads, but
potentially as low as 3400 IOPS = 27MB/s on really random workloads. I
can make it do worse than that as you'll see in a minute.

At a database scale of 500, I can get 2357 TPS:

postgres(at)toy:~$ /usr/lib/postgresql/8.4/bin/pgbench -c 64 -T 300 pgbench
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 500
query mode: simple
number of clients: 64
duration: 300 s
number of transactions actually processed: 707793
tps = 2357.497195 (including connections establishing)
tps = 2357.943894 (excluding connections establishing)

This is basically the same performance as the 4-disk setup with 256MB
battery-backed write controller I profiled at
http://www.2ndquadrant.us/pgbench-results/index.htm ; there XFS got as
high as 2332 TPS, albeit with a PostgreSQL patched for better
performance than I used here. This system has 16GB of RAM, so this is
exercising write speed only without needing to read anything from disk;
not too hard for regular drives to do. Performance holds at a scale of
1000 however:

postgres(at)toy:~$ /usr/lib/postgresql/8.4/bin/pgbench -c 64 -T 300 -l pgbench
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 1000
query mode: simple
number of clients: 64
duration: 300 s
number of transactions actually processed: 586043
tps = 1953.006031 (including connections establishing)
tps = 1953.399065 (excluding connections establishing)

Whereas my regular drives are lucky to hit 350 TPS here. So this is the
typical sweet spot for SSD: workload is bigger than RAM, but not so
much bigger than RAM that reads & writes become completely random.

If I crank the scale way up, to 4000 = 58GB, now I'm solidly in
seek-bound behavior, which does about twice as fast as my regular drive
array here (that's around 200 TPS on this test):

postgres(at)toy:~$ /usr/lib/postgresql/8.4/bin/pgbench -T 1800 -c 64 -l pgbench
starting vacuum...end.

transaction type: TPC-B (sort of)
scaling factor: 4000
query mode: simple
number of clients: 64
duration: 1800 s
number of transactions actually processed: 731568
tps = 406.417254 (including connections establishing)
tps = 406.430713 (excluding connections establishing)

Here's a snapshot of typical drive activity when running this:

avg-cpu: %user %nice %system %iowait %steal %idle
2.29 0.00 1.30 54.80 0.00 41.61

Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
avgrq-sz avgqu-sz await svctm %util
sdc 0.00 676.67 443.63 884.00 7.90 12.25
31.09 41.77 31.45 0.75 99.93

So we're down to around 20MB/s, just as sysbench predicted a seek-bound
workload would be on these drives.

I can still see checkpoint spikes here where sync times go upward:

2011-04-06 20:40:58.969 EDT: LOG: checkpoint complete: wrote 2959
buffers (9.0%); 0 transaction log file(s) added, 0 removed, 0 recycled;
write=147.300 s, sync=32.885 s, total=181.758 s

But the drive seems to never become unresponsive for longer than a second:

postgres(at)toy:~$ cat pgbench_log.4585 | cut -d" " -f 6 | sort -n | tail
999941
999952
999956
999959
999960
999970
999977
999984
999992
999994

Power-plug pull tests with diskchecker.pl and a write-heavy database
load didn't notice anything funny about the write cache:

[witness]
$ wget http://code.sixapart.com/svn/tools/trunk/diskchecker.pl
$ chmod +x ./diskchecker.pl
$ ./diskchecker.pl -l

[server with SSD]
$ wget http://code.sixapart.com/svn/tools/trunk/diskchecker.pl
$ chmod +x ./diskchecker.pl
$ diskchecker.pl -s grace create test_file 500

diskchecker: running 20 sec, 69.67% coverage of 500 MB (38456 writes;
1922/s)
diskchecker: running 21 sec, 71.59% coverage of 500 MB (40551 writes;
1931/s)
diskchecker: running 22 sec, 73.52% coverage of 500 MB (42771 writes;
1944/s)
diskchecker: running 23 sec, 75.17% coverage of 500 MB (44925 writes;
1953/s)
[pull plug]

/home/gsmith/diskchecker.pl -s grace verify test_file
verifying: 0.00%
verifying: 0.73%
verifying: 7.83%
verifying: 14.98%
verifying: 22.10%
verifying: 29.23%
verifying: 36.39%
verifying: 43.50%
verifying: 50.65%
verifying: 57.70%
verifying: 64.81%
verifying: 71.86%
verifying: 79.02%
verifying: 86.11%
verifying: 93.15%
verifying: 100.00%
Total errors: 0

2011-04-06 21:43:09.377 EDT: LOG: database system was interrupted; last
known up at 2011-04-06 21:30:27 EDT
2011-04-06 21:43:09.392 EDT: LOG: database system was not properly shut
down; automatic recovery in progress
2011-04-06 21:43:09.394 EDT: LOG: redo starts at 6/BF7B2880
2011-04-06 21:43:10.687 EDT: LOG: unexpected pageaddr 5/C2786000 in log
file 6, segment 205, offset 7888896
2011-04-06 21:43:10.687 EDT: LOG: redo done at 6/CD784400
2011-04-06 21:43:10.687 EDT: LOG: last completed transaction was at log
time 2011-04-06 21:39:00.551065-04
2011-04-06 21:43:10.705 EDT: LOG: checkpoint starting: end-of-recovery
immediate
2011-04-06 21:43:14.766 EDT: LOG: checkpoint complete: wrote 29915
buffers (91.3%); 0 transaction log file(s) added, 0 removed, 106
recycled; write=0.146 s, sync=3.904 s, total=4.078 s
2011-04-06 21:43:14.777 EDT: LOG: database system is ready to accept
connections

So far, this drive is living up to expectations, without doing anything
unexpected good or bad. When doing the things that SSD has the biggest
advantage over mechanical drives, it's more than 5X as fast as a 4-disk
array (3 disk DB + wal) with a BBWC. But on really huge workloads,
where the worst-cast behavior of the drive is being hit, that falls to
closer to a 2X advantage. And if you're doing work that isn't random
much at all, the drive only matches regular disk.

I like not having surprises in this sort of thing though. Intel 320
series gets a preliminary thumbs-up from me. I'll be happy when these
are mainstream enough that I can finally exit the anti-Intel SSD pulpit
I've been standing on the last two years.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message David Boreham 2011-04-07 02:56:16 Re: Intel SSDs that may not suck
Previous Message Greg Smith 2011-04-07 01:32:27 Re: Intel SSDs that may not suck