Re: fsync reliability

From: Greg Stark <gsstark(at)mit(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: fsync reliability
Date: 2011-04-22 14:31:18
Message-ID: BANLkTikuVpqWQ4UYfzNq95-dd2AvnkETFA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Apr 21, 2011 at 4:55 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> The traditional standard is that the filesystem is supposed to take
> care of its own metadata, and even Linux filesystems have pretty much
> figured that out.  I don't really see a need for us to be nursemaiding
> the filesystem.  At most there's a documentation issue here, ie, we
> ought to be more explicit about which filesystems and which mount
> options we recommend.

To be fair the traditional standard was that filesystem metadata was
written synchronously. That is, the creat/rename/unlink calls didn't
finish until the data had been written. That was never brilliant but
it was simple. It's unclear to me whether that API was decided on
because the implementation of anything else was hard or whether it was
implemented that way because it was deemed a good idea to define the
API that way. I suspect it was the former.

As APIs go, having meta-data operations be buffered and reusing fsync
on the directory to block until they're written seems as sane as
anything else. It's a bit of a pain for us to keep track of which
files have been created or deleted in a directory and fsync the
directory on checkpoint but that's just because we've already gone to
special efforts to keep track of what data is dirty but not done
anything to keep track of which directories have been dirtied.

--
greg

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2011-04-22 14:35:02 Re: psql 9.1 alpha5: connection pointer is NULL
Previous Message Tom Lane 2011-04-22 14:29:17 Re: "stored procedures"