| From: | Craig Ringer <craig(at)2ndquadrant(dot)com> | 
|---|---|
| To: | Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk> | 
| Cc: | Bruce Momjian <bruce(at)momjian(dot)us>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Anthony Iliopoulos <ailiop(at)altatus(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Catalin Iacob <iacobcatalin(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> | 
| Subject: | Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS | 
| Date: | 2018-04-06 01:27:05 | 
| Message-ID: | CAMsr+YEjnSyiMJ63yPDJvw8j95qrwD=pZ5u8e=QKiPX5CEm38w@mail.gmail.com | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
On 6 April 2018 at 07:37, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk> wrote:
> Note: as I've brought up in another thread, it turns out that PG is not
> handling fsync errors correctly even when the OS _does_ do the right
> thing (discovered by testing on FreeBSD).
Yikes. For other readers, the related thread for this is
https://www.postgresql.org/message-id/87y3i1ia4w.fsf@news-spur.riddles.org.uk
Meanwhile, I've extended my test to run postgres on a deliberately faulty
volume and confirmed my results there.
_____________
2018-04-06 01:11:40.555 UTC [58] LOG:  checkpoint starting: immediate force
wait
2018-04-06 01:11:40.567 UTC [58] ERROR:  could not fsync file
"base/12992/16386": Input/output error
2018-04-06 01:11:40.655 UTC [66] ERROR:  checkpoint request failed
2018-04-06 01:11:40.655 UTC [66] HINT:  Consult recent messages in the
server log for details.
2018-04-06 01:11:40.655 UTC [66] STATEMENT:  CHECKPOINT
Checkpoint failed with checkpoint request failed
HINT:  Consult recent messages in the server log for details.
Retrying
2018-04-06 01:11:41.568 UTC [58] LOG:  checkpoint starting: immediate force
wait
2018-04-06 01:11:41.614 UTC [58] LOG:  checkpoint complete: wrote 0 buffers
(0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.001 s,
sync=0.000 s, total=0.046 s; sync files=3, longest=0.000 s, average=0.000
s; distance=2727 kB, estimate=2779 kB
Ooops, it worked! We ignored the error and checkpointed OK.
_____________
Given your report, now I have to wonder if we even reissued the fsync() at
all this time. 'perf' time. OK, with
sudo perf record -e syscalls:sys_enter_fsync,syscalls:sys_exit_fsync -a
sudo perf script
I see the failed fync, then the same fd being fsync()d without error on the
next checkpoint, which succeeds.
        postgres  9602 [003] 72380.325817: syscalls:sys_enter_fsync: fd:
0x00000005
        postgres  9602 [003] 72380.325931:  syscalls:sys_exit_fsync:
0xfffffffffffffffb
...
        postgres  9602 [000] 72381.336767: syscalls:sys_enter_fsync: fd:
0x00000005
        postgres  9602 [000] 72381.336840:  syscalls:sys_exit_fsync: 0x0
... and Pg continues merrily on its way without realising it lost data:
[72379.834872] XFS (dm-0): writeback error on sector 118752
[72380.324707] XFS (dm-0): writeback error on sector 118688
In this test I set things up so the checkpointer would see the first
fsync() error. But if I make checkpoints less frequent, the bgwriter
aggressive, and kernel dirty writeback aggressive, it should be possible to
have the failure go completely unobserved too. I'll try that next, because
we've already largely concluded that the solution to the issue above is to
PANIC on fsync() error. But if we don't see the error at all we're in
trouble.
-- 
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Peter Eisentraut | 2018-04-06 01:29:06 | Re: PATCH: psql tab completion for SELECT | 
| Previous Message | Thomas Munro | 2018-04-06 01:11:36 | Re: Checkpoint not retrying failed fsync? |