From: | KONDO Mitsumasa <kondo(dot)mitsumasa(at)lab(dot)ntt(dot)co(dot)jp> |
---|---|
To: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Improvement of checkpoint IO scheduler for stable transaction responses |
Date: | 2013-06-10 10:51:29 |
Message-ID: | 51B5AFB1.5030404@lab.ntt.co.jp |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
I create patch which is improvement of checkpoint IO scheduler for stable
transaction responses.
* Problem in checkpoint IO schedule in heavy transaction case
When heavy transaction in database, I think PostgreSQL checkpoint scheduler
has two problems at start and end of checkpoint. One problem is IO heavy when
starting initial checkpoint in rounds of checkpoint. This problem was caused by
full-page-write which cause WAL IO in fast page writes after checkpoint write
page. Therefore, when starting checkpoint, WAL-based checkpoint scheduler wrong
judgment that is late schedule by full-page-write, nevertheless checkpoint
schedule is not late. This is caused bad transaction response. I think WAL-based
checkpoint scheduler was not property in starting checkpoint. Second problem is
fsync freeze problem in end of checkpoint. Normally, checkpoint write is executed
in background by OS's IO scheduler. But when it does not correctly work, end of
checkpoint fsync was caused IO freeze and slower transactions. Unexpected slow
transaction will cause monitor error in HA-cluster and decrease user-experience
in application service. It is especially serious problem in cloud and virtual
server database system which does not have IO performance. However we don't have
solution in postgresql.conf parameter very much. We prefer checkpoint time to
fast response transactions. In fact checkpoint time is short, and it becomes
little bit long that is not problem. You may think that checkpoint_segments and
checkpoint_timeout are set larger value, however large checkpoint_segments
affects file-cache which is not read and is wasted, and large checkpoint_timeout
was caused long-time crash-recovery.
* Improvement method of checkpoint IO scheduler
1. Improvement full-page-write IO heavy problem in start of checkpoint
My idea is very simple. When start of checkpoint, checkpoint_completion_target
become more loose. I set three parameter of this issue;
'checkpoint_smooth_target', 'checkpoint_smooth_margin' and
'checkpointer_write_delay'. 'checkpointer_smooth_target' parameter is a term
point that is smooth checkpoint IO schedule in checkpoint progress.
'checkpoint_smooth_margin' parameter can be more smooth checkpoint schedule. It
is heuristic parameter, but it solves this problem effectively.
'checkpointer_write_delay' parameter is sleep time for checkpoint schedule. This
parameter is nearly same 'bgwriter_delay' in PG9.1 older.
If you want to get more detail information, please see attached patch.
2. Improvement fsync freeze problem in end of checkpoint
When fsync freeze problem was happened, file fsync more repeatedly is
meaningless and causes stop transactions. So I think, if fsync executing time was
long, IO queue is flooded and should give IO priority to transactions for fast
response time. It realize by inserting sleep time during fsync when fsync time
was long. It seems to be long time in checkpoint, but it is not very long. In
fact, when fsync time is long, IO queue is packed by another IO which is included
checkpoint writes, it only gives IO priority to another executing transactions.
I tested my patch in DBT-2 benchmark. Please see result of test. My patch
realize higher transaction and fast response than plain PG. Checkpoint time is
little bit longer than plain PG, but it is not serious.
* Result of DBT-2 with this patch. (Compared with original PG9.2.4)
I use DBT-2 benchmark software by OSDL. I also use pg_statsinfo and
pg_stats_reporter in this benchmark.
- Patched PG (patched 9.2.4)
DBT-2 result: http://goo.gl/1PD3l
statsinfo report: http://goo.gl/UlGAO
settings: http://goo.gl/X4Whu
- Original PG (9.2.4)
DBT-2 result: http://goo.gl/XVxtj
statsinfo report: http://goo.gl/UT1Li
settings: http://goo.gl/eofmb
Measurement Value is improved 4%, 'new-order 90%tile' is improved 20%,
'new-order average' is improved 18%, 'new-order deviation' is improved 24%, and
'new-order maximum' is improved 27%. I confirm high throughput and WAL IO at
executing checkpoint in pg_stats_reporter's report. My patch realizes high
response transactions and non-blocking executing transactions.
Bad point of my patch is longer checkpoint. Checkpoint time was increased about
10% - 20%. But it can work correctry on schedule-time in checkpoint_timeout.
Please see checkpoint result (http://goo.gl/NsbC6)
* Test server
Server: HP Proliant DL360 G7
CPU: Xeon E5640 2.66GHz (1P/4C)
Memory: 18GB(PC3-10600R-9)
Disk: 146GB(15k)*4 RAID1+0
RAID controller: P410i/256MB
It is not advertisement of pg_statsinfo and pg_stats_reporter:-) They are free
software. If you have comment and another idea about my patch, please send me.
Best Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center
Attachment | Content-Type | Size |
---|---|---|
improvement_checkpoint_io-scheduler_v0.patch | text/x-diff | 9.8 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Dimitri Fontaine | 2013-06-10 12:59:31 | Re: Configurable location for extension .control files |
Previous Message | Fabien COELHO | 2013-06-10 10:40:04 | Re: [PATCH] pgbench --throttle (submission 7 - with lag measurement) |