From: | Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr> |
---|---|
To: | Takashi Horikawa <t-horikawa(at)aj(dot)jp(dot)nec(dot)com> |
Cc: | "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Partitioned checkpointing |
Date: | 2015-09-26 12:30:27 |
Message-ID: | alpine.DEB.2.10.1509261416590.8351@sto |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hello,
These are interesting runs.
> In a situation in which small values are set in dirty_bytes and
> dirty_backgound_bytes, a buffer is likely stored in the HD immediately
> after the buffer is written in the kernel by the checkpointer. Thus, I
> tried a quick hack to make the checkpointer invoke write system call to
> write a dirty buffer immediately followed by invoking store operation
> for a buffer implemented with sync_file_range() system call. # For
> reference, I attach the patch. As shown in file_sync_range.JPG, this
> strategy considered to have been effective.
Indeed. This approach is part of this current patch:
https://commitfest.postgresql.org/6/260/
Basically, what you do is to call sync_file_range on each block, and you
tested on a high-end system probably with a lot of BBU disk cache, which I
guess allows the disk to reorder writes so as to benefit from sequential
write performance.
> In conclusion, as long as pgbench execution against linux concerns,
> using sync_file_range() is a promising solution.
I found that calling sync_file_range for every block could degrade
performance a bit under some conditions, at least onmy low-end systems
(just a [raid] disk, no significant disk cache in front of it), so the
above patch aggregates neighboring writes so as to issue less
sync_file_range calls.
> That is, the checkpointer invokes sync_file_range() to store a buffer
> immediately after it writes the buffer in the kernel.
Yep. It is interesting that sync_file_range alone improves stability a lot
on your high-end system, although sorting is mandatory for low-end
systems.
My interpretation, already stated above, is that the hardware does the
sorting on the cached data at the disk level in your system.
--
Fabien.
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2015-09-26 12:38:35 | Re: Parallel Seq Scan |
Previous Message | Michael Paquier | 2015-09-26 11:57:25 | Re: pageinspect patch, for showing tuple data |