From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Craig Ringer <craig(at)2ndquadrant(dot)com> |
Cc: | Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Postgres, fsync, and OSs (specifically linux) |
Date: | 2018-04-30 16:09:45 |
Message-ID: | 20180430160945.5s5qfoqryhtmugxl@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2018-04-30 13:03:24 +0800, Craig Ringer wrote:
> Hrm, something else that just came up. On 9.6+ we use sync_file_range.
> It's surely going to eat errors:
>
> rc = sync_file_range(fd, offset, nbytes,
> SYNC_FILE_RANGE_WRITE);
>
> /* don't error out, this is just a performance optimization */
> if (rc != 0)
> {
> ereport(WARNING,
> (errcode_for_file_access(),
> errmsg("could not flush dirty data: %m")));
> }
It's not. Only SYNC_FILE_RANGE_WAIT_{BEFORE,AFTER} eat errors. Which
seems sensible, because they could be considered data integrity
operations.
fs/sync.c:
SYSCALL_DEFINE4(sync_file_range, int, fd, loff_t, offset, loff_t, nbytes,
unsigned int, flags)
{
...
if (flags & SYNC_FILE_RANGE_WAIT_BEFORE) {
ret = file_fdatawait_range(f.file, offset, endbyte);
if (ret < 0)
goto out_put;
}
if (flags & SYNC_FILE_RANGE_WRITE) {
ret = __filemap_fdatawrite_range(mapping, offset, endbyte,
WB_SYNC_NONE);
if (ret < 0)
goto out_put;
}
if (flags & SYNC_FILE_RANGE_WAIT_AFTER)
ret = file_fdatawait_range(f.file, offset, endbyte);
where
int file_fdatawait_range(struct file *file, loff_t start_byte, loff_t end_byte)
{
struct address_space *mapping = file->f_mapping;
__filemap_fdatawait_range(mapping, start_byte, end_byte);
return file_check_and_advance_wb_err(file);
}
EXPORT_SYMBOL(file_fdatawait_range);
the critical call is file_check_and_advance_wb_err(). That's the call
that checks and clears errors.
Would be good to add a kernel xfstest (gets used on all relevant FS
these days), to make sure that doesn't change.
> I'm very suspicious about the safety of the msync() path too.
That seems justified however:
SYSCALL_DEFINE3(msync, unsigned long, start, size_t, len, int, flags)
{
...
error = vfs_fsync_range(file, fstart, fend, 1);
..
*/
int vfs_fsync_range(struct file *file, loff_t start, loff_t end, int datasync)
{
...
return file->f_op->fsync(file, start, end, datasync);
}
EXPORT_SYMBOL(vfs_fsync_range);
int ext4_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
{
...
ret = file_write_and_wait_range(file, start, end);
if (ret)
return ret;
...
STATIC int
xfs_file_fsync(
struct file *file,
loff_t start,
loff_t end,
int datasync)
{
...
error = file_write_and_wait_range(file, start, end);
if (error)
return error;
int file_write_and_wait_range(struct file *file, loff_t lstart, loff_t lend)
{
...
err2 = file_check_and_advance_wb_err(file);
if (!err)
err = err2;
return err;
}
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2018-04-30 17:07:49 | Re: "could not reattach to shared memory" on buildfarm member dory |
Previous Message | Andrew Dunstan | 2018-04-30 16:09:12 | Re: Intermittent ECPG test failures on Windows buildfarm machines |