From: | Simon Riggs <simon(at)2ndQuadrant(dot)com> |
---|---|
To: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | fsync reliability |
Date: | 2011-04-21 08:26:06 |
Message-ID: | BANLkTinE_Syc3Fh+-F2LhSuiZktMHehBfA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Daniel Farina points out to me that the Linux man page for fsync() says
"Calling fsync() does not necessarily ensure that the entry in the directory
containing the file has also reached disk. For that an
explicit fsync() on a
file descriptor for the directory is also needed."
http://www.kernel.org/doc/man-pages/online/pages/man2/fsync.2.html
That phrase does not exist here
http://pubs.opengroup.org/onlinepubs/007908799/xsh/fsync.html
This point appears to have been discussed before
http://postgresql.1045698.n5.nabble.com/ALTER-DATABASE-SET-TABLESPACE-vs-crash-safety-td1995703.html
Tom said
"We don't try to "fsync the
directory" after a normal table create for instance"
which is fine because we don't need to. In the event of a crash a
missing table would be recreated during crash recovery.
However, that begs the question of what happens with WAL. At present,
we do nothing to ensure that "the entry in the directory containing
the file has also reached disk".
ISTM that we can easily do this, since we preallocate WAL files during
RemoveOldXlogFiles() and rarely extend the number of files.
So it seems easily possible to fsync the pg_xlog directory at the end
of RemoveOldXlogFiles(), which is mostly performed by the bgwriter
anyway.
It was also noted that "we've always expected the filesystem to take
care of its own metadata"
which isn't actually stated anywhere in the docs, AFAIK.
Perhaps this is an irrelevant problem these days, but would it hurt to fix?
Happy to do the patch if we agree.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | rajibdk | 2011-04-21 09:31:20 | Re: database system identifier differs between the primary and standby |
Previous Message | tomas | 2011-04-21 06:43:46 | Re: Formatting Curmudgeons WAS: MMAP Buffers |