Re: Performances issues with SSD volume ?

From: Thomas SIMON <tsimon(at)neteven(dot)com>
To: Glyn Astill <glynastill(at)yahoo(dot)co(dot)uk>
Cc: "pgsql-admin(at)postgresql(dot)org" <pgsql-admin(at)postgresql(dot)org>
Subject: Re: Performances issues with SSD volume ?
Date: 2015-05-21 12:28:09
Message-ID: 555DCF59.3060808@neteven.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

disk was already in noop mode

cat /sys/block/sdc/queue/scheduler
noop [deadline] cfq

Thomas

Le 20/05/2015 19:03, Glyn Astill a écrit :
> ----- Original Message -----
>> From: Glyn Astill <glynastill(at)yahoo(dot)co(dot)uk>
>> To: Thomas SIMON <tsimon(at)neteven(dot)com>
>> Cc: "pgsql-admin(at)postgresql(dot)org" <pgsql-admin(at)postgresql(dot)org>
>> Sent: Wednesday, 20 May 2015, 17:50
>> Subject: Re: [ADMIN] Performances issues with SSD volume ?
>>
>>
>>
>>> From: Thomas SIMON <tsimon(at)neteven(dot)com>
>>> To: glynastill(at)yahoo(dot)co(dot)uk
>>> Cc: "pgsql-admin(at)postgresql(dot)org"
>> <pgsql-admin(at)postgresql(dot)org>
>>> Sent: Wednesday, 20 May 2015, 16:41
>>> Subject: Re: [ADMIN] Performances issues with SSD volume ?
>>>
>>> Hi Glyn,
>>>
>>> I'll try to answer this points.
>>>
>>> I've made some benchs, and indeed 3.2 not helping. not helping at all.
>>> I changed to 3.14 and gap is quite big !
>>> With pgbench RW test, 3.2 --> 4200 TPS ; 3.14 --> 6900 TPS in same
>>> conditions
>>> With pgbench RO test, 3.2 --> 37000 TPS ; 3.14 --> 95000 TPS, same
>>> conditions too.
>>
>> That's a start then.
>>
>>> It should so be better, but when server was in production, and ever with
>>> bad kernel, performances was already quite good before they quickly
>>> decreased.
>>> So i think too I have another configuration problem.
>>>
>>> You say you're IO bound, so some output from sar / iostat / dstat and
>>> pg_stat_activity etc before and during the issue would be of use.
>>>
>>> -> My server is not in production right now, so it is difficult to
>>> replay production load and have some useful metrics.
>>> The best way I've found is to replay trafic from logs with pgreplay.
>>> I hoped that the server falls back by replaying this traffic, but it
>>> never happens ... Another thing I can't understand ...
>>>
>>> Below is my dstat output when I replay this traffic (and so when server
>>> runs normally)
>>> I have unfortunately no more outputs when server's performances
>> decreased.
>>
>> It's a shame we can't get any insight into activity on the server during
>> the issues.
>>>
>>> Other things you asked
>>>
>>> System memory size : 256 Go
>>> SSD Model numbers and how many : 4 SSd disks ; RAID 10 ; model
>>> INTEL SSDSC2BB480G4
>>> Raid controller : MegaRAID SAS 2208
>>> Partition alignments and stripe sizes : see fdisk delow
>>> Kernel options : the config file is here :
>>>
>> ftp://ftp.ovh.net/made-in-ovh/bzImage/3.14.43/config-3.14.43-xxxx-std-ipv6-64
>>> Filesystem used and mount options : ext4, see mtab below
>>> IO Scheduler : noop [deadline] cfq for my ssd raid volume
>>> Postgresql version and configuration : 9.3.5
>>>
>>> max_connections=1800
>>> shared_buffers=8GB
>>> temp_buffers=32MB
>>> work_mem=100MB
>>> maintenance_work_mem=12GB
>>> bgwriter_lru_maxpages=200
>>> effective_io_concurrency=4
>>> wal_level=hot_standby
>>> wal_sync_method=fdatasync
>>> wal_writer_delay=2000ms
>>> commit_delay=1000
>>> checkpoint_segments=80
>>> checkpoint_timeout=15min
>>> checkpoint_completion_target=0.7
>>> archive_command='rsync ....'
>>> max_wal_senders=10
>>> wal_keep_segments=38600
>>> vacuum_defer_cleanup_age=100
>>> hot_standby = on
>>> max_standby_archive_delay = 5min
>>> max_standby_streaming_delay = 5min
>>> hot_standby_feedback = on
>>> random_page_cost = 1.0
>>> effective_cache_size = 240GB
>>> log_min_error_statement = warning
>>> log_min_duration_statement = 0
>>> log_checkpoints = on
>>> log_connections = on
>>> log_disconnections = on
>>> log_line_prefix = '%m|%u|%d|%c|'
>>> log_lock_waits = on
>>> log_statement = 'all'
>>> log_timezone = 'localtime'
>>> track_activities = on
>>> track_functions = pl
>>> track_activity_query_size = 8192
>>> autovacuum_max_workers = 5
>>> autovacuum_naptime = 30s
>>> autovacuum_vacuum_threshold = 40
>>> autovacuum_analyze_threshold = 20
>>> autovacuum_vacuum_scale_factor = 0.10
>>> autovacuum_analyze_scale_factor = 0.10
>>> autovacuum_vacuum_cost_delay = 5ms
>>> default_transaction_isolation = 'read committed'
>>> max_locks_per_transaction = 128
>>>
>>>
>>>
>>> Connection pool sizing (pgpool2)
>>> num_init_children = 1790
>>> max_pool = 1
>>
>> 1800 is quite a lot of connections, and with max_pool=1 in pgpool you're
>> effectively just using pgpool as a proxy (as I recall, my memory is a little
>> fuzzy on pgpool now). Unless your app is stateful in some way or has unique
>> users for each of those 1800 connections you should lower the quantity of active
>> connections. A general starting point is usually cpu cores * 2, so you could up
>> max_pool and divide num_init_children by the same amount.
>>
>> Hard to say what you need to do without knowing what exactly you're doing
>> though. What's the nature of the app(s)?
>>
>>> I also add megacli parameters :
>>>
>>> Virtual Drive: 2 (Target Id: 2)
>>> Name :datassd
>>> RAID Level : Primary-1, Secondary-0, RAID Level Qualifier-0
>>> Size : 893.25 GB
>>> Sector Size : 512
>>> Is VD emulated : Yes
>>> Mirror Data : 893.25 GB
>>> State : Optimal
>>> Strip Size : 256 KB
>>> Number Of Drives per span:2
>>> Span Depth : 2
>>> Default Cache Policy: WriteThrough, ReadAheadNone, Direct, No Write
>>> Cache if Bad BBU
>>> Current Cache Policy: WriteThrough, ReadAheadNone, Direct, No Write
>>> Cache if Bad BBU
>>> Default Access Policy: Read/Write
>>> Current Access Policy: Read/Write
>>> Disk Cache Policy : Enabled
>>> Encryption Type : None
>>> Bad Blocks Exist: No
>>> PI type: No PI
>>>
>>> Is VD Cached: No
>>
>> Not using your raid controllers write cache then? Not sure just how important
>> that is with SSDs these days, but if you've got a BBU set it to
>> "WriteBack". Also change "Cache if Bad BBU" to "No
>> Write Cache if Bad BBU" if you do that.
>>
>>
>>> Other outputs :
>>> fdisk -l
>>>
>>> Disk /dev/sdc: 959.1 GB, 959119884288 bytes
>>> 255 heads, 63 sectors/track, 116606 cylinders, total 1873281024 sectors
>>> Units = sectors of 1 * 512 = 512 bytes
>>> Sector size (logical/physical): 512 bytes / 4096 bytes
>>> I/O size (minimum/optimal): 4096 bytes / 4096 bytes
>>> Disk identifier: 0x00000000
>>>
>>> Disk /dev/mapper/vg_datassd-lv_datassd: 751.6 GB, 751619276800 bytes
>>> 255 heads, 63 sectors/track, 91379 cylinders, total 1468006400 sectors
>>> Units = sectors of 1 * 512 = 512 bytes
>>> Sector size (logical/physical): 512 bytes / 4096 bytes
>>> I/O size (minimum/optimal): 4096 bytes / 4096 bytes
>>> Disk identifier: 0x00000000
>>>
>>>
>>> cat /etc/mtab
>>> /dev/mapper/vg_datassd-lv_datassd /datassd ext4
>>> rw,relatime,discard,nobarrier,data=ordered 0 0
>>> (I added nobarrier option)
>>>
>>>
>>> cat /sys/block/sdc/queue/scheduler
>>> noop [deadline] cfq
>>>
>>
>> You could swap relatime for noatime,nodiratime.
>>
>
> You could also see if the noop scheduler makes any improvement.
>
>

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Glyn Astill 2015-05-21 12:46:32 Re: Performances issues with SSD volume ?
Previous Message Thomas SIMON 2015-05-21 12:12:22 Re: Performances issues with SSD volume ?