Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

From: Anthony Iliopoulos <ailiop(at)altatus(dot)com>
To: Geoff Winkless <pgsqladmin(at)geoff(dot)dj>
Cc: Greg Stark <stark(at)mit(dot)edu>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, Bruce Momjian <bruce(at)momjian(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Catalin Iacob <iacobcatalin(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
Date: 2018-04-09 12:31:27
Message-ID: 20180409123126.GB4233@ai-wks
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Apr 09, 2018 at 01:03:28PM +0100, Geoff Winkless wrote:
> On 9 April 2018 at 11:50, Anthony Iliopoulos <ailiop(at)altatus(dot)com> wrote:
>
> > What you seem to be asking for is the capability of dropping
> > buffers over the (kernel) fence and idemnifying the application
> > from any further responsibility, i.e. a hard assurance
> > that either the kernel will persist the pages or it will
> > keep them around till the application recovers them
> > asynchronously, the filesystem is unmounted, or the system
> > is rebooted.
> >
>
> That seems like a perfectly reasonable position to take, frankly.

Indeed, as long as you are willing to ignore the consequences of
this design decision: mainly, how you would recover memory when no
application is interested in clearing the error. At which point
other applications with different priorities will find this position
rather unreasonable since there can be no way out of it for them.
Good luck convincing any OS kernel upstream to go with this design.

> The whole _point_ of an Operating System should be that you can do exactly
> that. As a developer I should be able to call write() and fsync() and know
> that if both calls have succeeded then the result is on disk, no matter
> what another application has done in the meantime. If that's a "difficult"
> problem then that's the OS's problem, not mine. If the OS doesn't do that,
> it's _not_doing_its_job_.

No OS kernel that I know of provides any promises for atomicity of a
write()+fsync() sequence, unless one is using O_SYNC. It doesn't
provide you with isolation either, as this is delegated to userspace,
where processes that share a file should coordinate accordingly.

It's not a difficult problem, but rather the kernels provide a common
denominator of possible interfaces and designs that could accommodate
a wider range of potential application scenarios for which the kernel
cannot possibly anticipate requirements. There have been plenty of
experimental works for providing a transactional (ACID) filesystem
interface to applications. On the opposite end, there have been quite
a few commercial databases that completely bypass the kernel storage
stack. But I would assume it is reasonable to figure out something
between those two extremes that can work in a "portable" fashion.

Best regards,
Anthony

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Anthony Iliopoulos 2018-04-09 12:54:16 Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
Previous Message Heikki Linnakangas 2018-04-09 12:30:39 Re: pgsql: Store 2PC GID in commit/abort WAL recs for logical decoding