From: | Craig Ringer <craig(at)2ndquadrant(dot)com> |
---|---|
To: | Michael Paquier <michael(at)paquier(dot)xyz> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Anthony Iliopoulos <ailiop(at)altatus(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, Bruce Momjian <bruce(at)momjian(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Catalin Iacob <iacobcatalin(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS |
Date: | 2018-04-10 12:15:15 |
Message-ID: | CAMsr+YF15Z8ATbpSxBkA9s1PZLYPJw6ar0KtddLpq_hxdfBYiQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 10 April 2018 at 14:10, Michael Paquier <michael(at)paquier(dot)xyz> wrote:
> Well, I think that there is place for improving reporting of failure
> in file_utils.c for frontends, or at worst have an exit() for any kind
> of critical failures equivalent to a PANIC.
Yup.
In the mean time, speaking of PANIC, here's the first cut patch to
make Pg panic on fsync() failures. I need to do some closer review and
testing, but it's presented here for anyone interested.
I intentionally left some failures as ERROR not PANIC, where the
entire operation is done as a unit, and an ERROR will cause us to
retry the whole thing.
For example, when we fsync() a temp file before we move it into place,
there's no point panicing on failure, because we'll discard the temp
file on ERROR and retry the whole thing.
I've verified that it works as expected with some modifications to the
test tool I've been using (pushed).
The main downside is that if we panic in redo, we don't try again. We
throw our toys and shut down. But arguably if we get the same I/O
error again in redo, that's the right thing to do anyway, and quite
likely safer than continuing to ERROR on checkpoints indefinitely.
Patch attached.
To be clear, this patch only deals with the issue of us retrying
fsyncs when it turns out to be unsafe. This does NOT address any of
the issues where we won't find out about writeback errors at all.
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachment | Content-Type | Size |
---|---|---|
v1-0001-PANIC-when-we-detect-a-possible-fsync-I-O-error-i.patch | text/x-patch | 10.3 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Eisentraut | 2018-04-10 12:28:29 | Re: [PATCH] pg_hba.conf : new auth option : clientcert=verify-full |
Previous Message | Ashutosh Bapat | 2018-04-10 12:11:55 | Re: [HACKERS] path toward faster partition pruning |