Re: fsync-pgdata-on-recovery tries to write to more files than previously

From: Abhijit Menon-Sen <ams(at)2ndQuadrant(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Christoph Berg <myon(at)debian(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: fsync-pgdata-on-recovery tries to write to more files than previously
Date: 2015-05-27 06:16:39
Message-ID: 20150527061639.GA31904@toroid.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At 2015-05-26 22:44:03 +0200, andres(at)anarazel(dot)de wrote:
>
> So what I propose is:
> 1) Remove the automatic symlink following
> 2) Follow pg_tbspc/*, pg_xlog if it's a symlink, fix the latter in
> initdb -S
> 3) Add a elevel argument to walkdir(), return if AllocateDir() fails,
> continue for stat() failures in the readdir() loop.
> 4) Add elevel argument to pre_sync_fname, fsync_fname, return after
> errors.
> 5) Accept EACCESS, ETXTBSY (if defined) when open()ing the files. By
> virtue of not following symlinks we should not need to worry about
> EROFS

Here's a WIP patch for discussion.

I've (a) removed the S_ISLNK() branch in walkdir, (b) reintroduced
walktblspc_links to call walkdir on each of the entries within pg_tblspc
(simpler than trying to make walkdir follow links only for pg_xlog and
under pg_tblspc), (c) call walkdir on pg_xlog if it's a symlink (not
done for initdb -S; will submit separately), (d) add elevel arguments as
described, (e) ignore EACCES and ETXTBSY.

This correctly fsync()s stuff according to strace, and doesn't die if
there are unreadable files/links in PGDATA.

What I haven't done is return if AllocateDir() fails. I'm not convinced
that's correct, because it'll not complain if PGDATA is unreadable (but
this will break other things, so it doesn't matter), but also will die
if readdir fails rather than opendir.

I'm trying a couple of approaches to that (e.g. using readdir directly
instead of ReadDir), but other suggestions are welcome.

-- Abhijit

Attachment Content-Type Size
fsync-wip.patch text/x-diff 8.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Abhijit Menon-Sen 2015-05-27 06:43:29 Re: fsync-pgdata-on-recovery tries to write to more files than previously
Previous Message Michael Paquier 2015-05-27 03:39:44 Re: why does txid_current() assign new transaction-id?