Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Anthony Iliopoulos <ailiop(at)altatus(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Catalin Iacob <iacobcatalin(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
Date: 2018-04-04 02:44:22
Message-ID: CAEepm=07BG8K1Xqu05P1pHkygNPP4Ff+9nq-gF21cHiatjEqhw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Apr 4, 2018 at 2:14 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> On Tue, Apr 3, 2018 at 10:05:19PM -0400, Bruce Momjian wrote:
>> On Wed, Apr 4, 2018 at 01:54:50PM +1200, Thomas Munro wrote:
>> > I believe there were some problems of that nature (with various
>> > twists, based on other concurrent activity and possibly different
>> > fds), and those problems were fixed by the errseq_t system developed
>> > by Jeff Layton in Linux 4.13. Call that "bug #1".
>>
>> So all our non-cutting-edge Linux systems are vulnerable and there is no
>> workaround Postgres can implement? Wow.
>
> Uh, are you sure it fixes our use-case? From the email description it
> sounded like it only reported fsync errors for every open file
> descriptor at the time of the failure, but the checkpoint process might
> open the file _after_ the failure and try to fsync a write that happened
> _before_ the failure.

I'm not sure of anything. I can see that it's designed to report
errors since the last fsync() of the *file* (presumably via any fd),
which sounds like the desired behaviour:

https://github.com/torvalds/linux/blob/master/mm/filemap.c#L682

* When userland calls fsync (or something like nfsd does the equivalent), we
* want to report any writeback errors that occurred since the last fsync (or
* since the file was opened if there haven't been any).

But I'm not sure what the lifetime of the passed-in "file" and more
importantly "file->f_wb_err" is. Specifically, what happens to it if
no one has the file open at all, between operations? It is reference
counted, see fs/file_table.c. I don't know enough about it to
comment.

--
Thomas Munro
http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2018-04-04 02:47:25 Re: [HACKERS] Restrict concurrent update/delete with UPDATE of partition key
Previous Message Craig Ringer 2018-04-04 02:40:16 Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS