From: | Craig Ringer <craig(at)2ndquadrant(dot)com> |
---|---|
To: | Simon Riggs <simon(at)2ndquadrant(dot)com> |
Cc: | Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Postgres, fsync, and OSs (specifically linux) |
Date: | 2018-04-29 01:58:45 |
Message-ID: | CAMsr+YF1D=uZ59SRU0ZKq-eTXCmYyoYp8_d=gDTHSCq1PJjP5Q@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 28 April 2018 at 23:25, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> On 27 April 2018 at 15:28, Andres Freund <andres(at)anarazel(dot)de> wrote:
>
>> - Add a pre-checkpoint hook that checks for filesystem errors *after*
>> fsyncing all the files, but *before* logging the checkpoint completion
>> record. Operating systems, filesystems, etc. all log the error format
>> differently, but for larger installations it'd not be too hard to
>> write code that checks their specific configuration.
>>
>> While I'm a bit concerned adding user-code before a checkpoint, if
>> we'd do it as a shell command it seems pretty reasonable. And useful
>> even without concern for the fsync issue itself. Checking for IO
>> errors could e.g. also include checking for read errors - it'd not be
>> unreasonable to not want to complete a checkpoint if there'd been any
>> media errors.
>
> It seems clear that we need to evaluate our compatibility not just
> with an OS, as we do now, but with an OS/filesystem.
>
> Although people have suggested some approaches, I'm more interested in
> discovering how we can be certain we got it right.
TBH, we can't be certain, because there are too many failure modes,
some of which we can't really simulate in practical ways, or automated
ways.
But there are definitely steps we can take:
- Test the stack of FS, LVM (if any) etc with the dmsetup 'flakey'
target and a variety of workloads designed to hit errors at various
points. Some form of torture test.
- Almost up the device and see what happens if we write() then fsync()
enough to fill it.
- Plug-pull storage and see what happens, especially for multipath/iSCSI/SAN.
Experience with pg_test_fsync shows that it can also be hard to
reliably interpret the results of tests.
Again I'd like to emphasise that this is really only a significant
risk for a few configurations. Yes, it could result in Pg not failing
a checkpoint when it should if, say, your disk has a bad block it
can't repair and remap. But as Andres has pointed out in the past,
those sorts local storage failure cases tend toward "you're kind of
screwed anyway". It's only a serious concern when I/O errors are part
of the storage's accepted operation, as in multipath with default
settings.
We _definitely_ need to warn multipath users that the defaults are insane.
>> - Use direct IO. Due to architectural performance issues in PG and the
>> fact that it'd not be applicable for all installations I don't think
>> this is a reasonable fix for the issue presented here. Although it's
>> independently something we should work on. It might be worthwhile to
>> provide a configuration that allows to force DIO to be enabled for WAL
>> even if replication is turned on.
>
> "Use DirectIO" is roughly same suggestion as "don't trust Linux filesystems".
Surprisingly, that seems to be a lot of what's coming out of Linux
developers. Reliable buffered I/O? Why would you try to do that?
I know that's far from a universal position, though, and it sounds
like things were more productive in Andres's discussions at the meet.
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Pavel Stehule | 2018-04-29 03:35:48 | Re: [RFC] Add an until-0 loop in psql |
Previous Message | Craig Ringer | 2018-04-29 01:50:48 | Re: Postgres, fsync, and OSs (specifically linux) |