From: | Abhijit Menon-Sen <ams(at)2ndQuadrant(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Christoph Berg <myon(at)debian(dot)org>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: fsync-pgdata-on-recovery tries to write to more files than previously |
Date: | 2015-05-28 14:26:15 |
Message-ID: | 20150528142614.GA21390@toroid.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi.
Here's an updated patch for the fsync problem(s).
A few points that may be worth considering:
1. I've made the ReadDir → ReadDirExtended change, but in retrospect I
don't think it's really worth it. Using ReadDir and letting it die
is probably fine. (Aside: I left ReadDirExtended static for now.)
2. Robert, are you comfortable with what fsync_pgdata() does in xlog.c?
I wasn't sure if I should move that to fd.c as well. I think it's
borderline OK for now.
3. There's a comment in fsync_fname that we need to open directories
with O_RDONLY and non-directories with O_RDWR. Does this also apply
to pre_sync_fname (i.e. sync_file_range and posix_fadvise) on any
platforms we care about? It doesn't seem so, but I'm not sure.
4. As a partial aside, why does fsync_fname use OpenTransientFile? It
looks like it should use BasicOpenFile as pre_sync_fname does. We
close the fd immediately after calling fsync anyway. Or is there
some reason I'm missing that we should use OpenTransientFile in
pre_sync_fname too?
(Aside²: it's pointless to specify S_IRUSR|S_IWUSR in fsync_fname
since we're not setting O_CREAT. Minor, so will submit separate
patch.)
5. I made walktblspc_entries use stat() instead of lstat(), so that we
can handle symlinks and directories the same way. Note that we won't
continue to follow links inside the sub-directories because we use
walkdir instead of recursing.
But this depends on S_ISDIR() returning true after stat() on Windows
with a junction. Does that work? And are the entries in pb_tblspc on
Windows actually junctions?
I've tested this with unreadable files in PGDATA and junk inside
pb_tblspc. Feedback welcome.
I have a separate patch to initdb with the corresponding changes, which
I will post after dinner and a bit more testing.
-- Abhijit
Attachment | Content-Type | Size |
---|---|---|
0001-20150528-Skip-unreadable-files-in-fsync_pgdata-follow-only-ex.patch | text/x-diff | 11.7 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Noah Misch | 2015-05-28 14:39:15 | Re: pg_upgrade resets timeline to 1 |
Previous Message | Bruce Momjian | 2015-05-28 14:21:17 | Re: pg_upgrade resets timeline to 1 |