From: | Bruce Momjian <bruce(at)momjian(dot)us> |
---|---|
To: | Craig Ringer <craig(at)2ndquadrant(dot)com> |
Cc: | Greg Stark <stark(at)mit(dot)edu>, Andres Freund <andres(at)anarazel(dot)de>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Anthony Iliopoulos <ailiop(at)altatus(dot)com>, Geoff Winkless <pgsqladmin(at)geoff(dot)dj>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Catalin Iacob <iacobcatalin(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS |
Date: | 2018-04-18 11:46:15 |
Message-ID: | 20180418114615.GB20040@momjian.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Apr 18, 2018 at 06:04:30PM +0800, Craig Ringer wrote:
> On 18 April 2018 at 05:19, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> > On Tue, Apr 10, 2018 at 05:54:40PM +0100, Greg Stark wrote:
> >> On 10 April 2018 at 02:59, Craig Ringer <craig(at)2ndquadrant(dot)com> wrote:
> >>
> >> > Nitpick: In most cases the kernel reserves disk space immediately,
> >> > before returning from write(). NFS seems to be the main exception
> >> > here.
> >>
> >> I'm kind of puzzled by this. Surely NFS servers store the data in the
> >> filesystem using write(2) or the in-kernel equivalent? So if the
> >> server is backed by a filesystem where write(2) preallocates space
> >> surely the NFS server must behave as if it'spreallocating as well? I
> >> would expect NFS to provide basically the same set of possible
> >> failures as the underlying filesystem (as long as you don't enable
> >> nosync of course).
> >
> > I don't think the write is _sent_ to the NFS at the time of the write,
> > so while the NFS side would reserve the space, it might get the write
> > request until after we return write success to the process.
>
> It should be sent if you're using sync mode.
>
> >From my reading of the docs, if you're using async mode you're already
> open to so many potential corruptions you might as well not bother.
>
> I need to look into this more re NFS and expand the tests I have to
> cover that properly.
So, if sync mode passes the write to NFS, and NFS pre-reserves write
space, and throws an error on reservation failure, that means that NFS
will not corrupt a cluster on out-of-space errors.
So, what about thin provisioning? I can understand sharing _free_ space
among file systems, but once a write arrives I assume it reserves the
space. Is the problem that many thin provisioning systems don't have a
sync mode, so you can't force the write to appear on the device before
an fsync?
--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +
From | Date | Subject | |
---|---|---|---|
Next Message | Konstantin Knizhnik | 2018-04-18 11:52:39 | Re: Built-in connection pooling |
Previous Message | Arthur Zakirov | 2018-04-18 11:37:11 | Re: [HACKERS] proposal: schema variables |