Quick Links

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

From:	Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To:	Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc:	Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, Bruce Momjian <bruce(at)momjian(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Anthony Iliopoulos <ailiop(at)altatus(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Catalin Iacob <iacobcatalin(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
Date:	2018-04-06 02:53:56
Message-ID:	CAEepm=19U-2_kzApS-DqqEkTAnp9meiaRXyi-VTC94fcst6agA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Fri, Apr 6, 2018 at 1:27 PM, Craig Ringer <craig(at)2ndquadrant(dot)com> wrote:
> On 6 April 2018 at 07:37, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk> wrote:
>> Note: as I've brought up in another thread, it turns out that PG is not
>> handling fsync errors correctly even when the OS _does_ do the right
>> thing (discovered by testing on FreeBSD).
>
> Yikes. For other readers, the related thread for this is
> https://www.postgresql.org/message-id/87y3i1ia4w.fsf@news-spur.riddles.org.uk

Yeah. That's really embarrassing, especially after beating up on
various operating systems all week. It's also an independent issue --
let's keep that on the other thread and get it fixed.

> I see the failed fync, then the same fd being fsync()d without error on the
> next checkpoint, which succeeds.
>
> postgres 9602 [003] 72380.325817: syscalls:sys_enter_fsync: fd:
> 0x00000005
> postgres 9602 [003] 72380.325931: syscalls:sys_exit_fsync:
> 0xfffffffffffffffb
> ...
> postgres 9602 [000] 72381.336767: syscalls:sys_enter_fsync: fd:
> 0x00000005
> postgres 9602 [000] 72381.336840: syscalls:sys_exit_fsync: 0x0
>
> ... and Pg continues merrily on its way without realising it lost data:
>
> [72379.834872] XFS (dm-0): writeback error on sector 118752
> [72380.324707] XFS (dm-0): writeback error on sector 118688
>
> In this test I set things up so the checkpointer would see the first fsync()
> error. But if I make checkpoints less frequent, the bgwriter aggressive, and
> kernel dirty writeback aggressive, it should be possible to have the failure
> go completely unobserved too. I'll try that next, because we've already
> largely concluded that the solution to the issue above is to PANIC on
> fsync() error. But if we don't see the error at all we're in trouble.

I suppose you only see errors because the file descriptors linger open
in the virtual file descriptor cache, which is a matter of luck
depending on how many relation segment files you touched. One thing
you could try to confirm our understand of the Linux 4.13+ policy
would be to hack PostgreSQL so that it reopens the file descriptor
every time in mdsync(). See attached.

--
Thomas Munro
http://www.enterprisedb.com

Attachment	Content-Type	Size
force-reopen-when-syncing.patch	application/octet-stream	1.5 KB

In response to

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS at 2018-04-06 01:27:05 from Craig Ringer

Responses

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS at 2018-04-06 03:20:22 from Craig Ringer

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Michael Paquier	2018-04-06 02:54:36	Re: pgsql: New files for MERGE
Previous Message	Peter Eisentraut	2018-04-06 02:53:26	Re: chained transactions