From: | Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz> |
---|---|
To: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: Changing default value of wal_sync_method to open_datasync on Linux |
Date: | 2018-02-20 00:57:54 |
Message-ID: | b7422f6a-91dc-c562-7315-aa6ca64cab5c@catalyst.net.nz |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 20/02/18 13:27, Tsunakawa, Takayuki wrote:
> Hello,
>
> I propose changing the default value of wal_sync_method from fdatasync to open_datasync on Linux. The patch is attached. I'm feeling this may be controversial, so I'd like to hear your opinions.
>
> The reason for change is better performance. Robert Haas said open_datasync was much faster than fdatasync with NVRAM in this thread:
>
> https://www.postgresql.org/message-id/flat/C20D38E97BCB33DAD59E3A1(at)lab(dot)ntt(dot)co(dot)jp#C20D38E97BCB33DAD59E3A1@lab.ntt.co.jp
>
> pg_test_fsync shows higher figures for open_datasync:
>
> [SSD on bare metal, ext4 volume mounted with noatime,nobarrier,data=ordered]
> --------------------------------------------------
> 5 seconds per test
> O_DIRECT supported on this platform for open_datasync and open_sync.
>
> Compare file sync methods using one 8kB write:
> (in wal_sync_method preference order, except fdatasync is Linux's default)
> open_datasync 50829.597 ops/sec 20 usecs/op
> fdatasync 42094.381 ops/sec 24 usecs/op
> fsync 42209.972 ops/sec 24 usecs/op
> fsync_writethrough n/a
> open_sync 48669.605 ops/sec 21 usecs/op
> --------------------------------------------------
>
>
> [HDD on VM, ext4 volume mounted with noatime,nobarrier,data=writeback]
> (the figures seem oddly high, though; this may be due to some VM configuration)
> --------------------------------------------------
> 5 seconds per test
> O_DIRECT supported on this platform for open_datasync and open_sync.
>
> Compare file sync methods using one 8kB write:
> (in wal_sync_method preference order, except fdatasync is Linux's default)
> open_datasync 34648.778 ops/sec 29 usecs/op
> fdatasync 31570.947 ops/sec 32 usecs/op
> fsync 27783.283 ops/sec 36 usecs/op
> fsync_writethrough n/a
> open_sync 35238.866 ops/sec 28 usecs/op
> --------------------------------------------------
>
>
> pgbench only shows marginally better results, although the difference is within an error range. The following is the tps of the default read/write workload of pgbench. I ran the test with all the tables and indexes preloaded with pg_prewarm (except pgbench_history), and the checkpoint not happening. I ran a write workload before running the benchmark so that no new WAL file would be created during the benchmark run.
>
> [SSD on bare metal, ext4 volume mounted with noatime,nobarrier,data=ordered]
> --------------------------------------------------
> 1 2 3 avg
> fdatasync 17610 17164 16678 17150
> open_datasync 17847 17457 17958 17754 (+3%)
>
> [HDD on VM, ext4 volume mounted with noatime,nobarrier,data=writeback]
> (the figures seem oddly high, though; this may be due to some VM configuration)
> --------------------------------------------------
> 1 2 3 avg
> fdatasync 4911 5225 5198 5111
> open_datasync 4996 5284 5317 5199 (+1%)
>
>
> As the removed comment describes, when wal_sync_method is open_datasync (or open_sync), open() fails with errno=EINVAL if the ext4 volume is mounted with data=journal. That's because open() specifies O_DIRECT in that case. I don't think that's a problem in practice, because data=journal will not be used for performance, and wal_level needs to be changed from its default replica to minimal and max_wal_senders must be set to 0 for O_DIRECT to be used.
>
>
I think the use of 'nobarrier' is probably disabling most/all reliable
writing to the devices. What do the numbers look like if use remove this
option?
regards
Mark
From | Date | Subject | |
---|---|---|---|
Next Message | Tsunakawa, Takayuki | 2018-02-20 01:09:37 | RE: Changing default value of wal_sync_method to open_datasync on Linux |
Previous Message | Tom Lane | 2018-02-20 00:37:46 | Re: master check fails on Windows Server 2008 |