From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Christoph Berg <myon(at)debian(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: fsync-pgdata-on-recovery tries to write to more files than previously |
Date: | 2015-05-26 20:44:03 |
Message-ID: | 20150526204403.GG5310@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2015-05-26 19:07:20 +0200, Andres Freund wrote:
> It is somewhat interesting that similar code has been used in
> pg_upgrade, via initdb -S, for a while now, without, to my knowledge, it
> causing reported problem. I think the relevant difference is that that
> code doesn't follow symlinks. It's obviously also less exercised and
> poeople might just have fixed up permissions when encountering troubles.
>
> Abhijit, do you recall why the code was changed to follow all symlinks
> in contrast to explicitly going through the tablespaces as initdb -S
> does? I'm pretty sure early versions of the patch pretty much had a
> verbatim copy of the initdb logic? That logic is missing pg_xlog btw,
> which is bad for pg_upgrade.
So, this was discussed in the following thread, starting at:
http://archives.postgresql.org/message-id/20150403163232.GA28444%40eldon.alvh.no-ip.org
"Actually, since surely we must follow symlinks everywhere, why do we
have to do this separately for pg_tblspc? Shouldn't that link-following
occur automatically when walking PGDATA in the first place?"
I don't think it's true that we must follow symlinks everywhere. I
think, as argued upthread, that it's sufficient to recurse through
PGDATA, follow the symlinks in pg_tbspc, and if a symlink, also go
through pg_xlog separately. There are no other places we it's "allowed"
to introduce symlinks and we have refuted bugreports of people having
problems after doing that.
So what I propose is:
1) Remove the automatic symlink following
2) Follow pg_tbspc/*, pg_xlog if it's a symlink, fix the latter in
initdb -S
3) Add a elevel argument to walkdir(), return if AllocateDir() fails,
continue for stat() failures in the readdir() loop.
4) Add elevel argument to pre_sync_fname, fsync_fname, return after
errors.
5) Accept EACCESS, ETXTBSY (if defined) when open()ing the files. By
virtue of not following symlinks we should not need to worry about
EROFS
I'm inclined to think that 4) is a big enough compat break that a
fsync_fname_ext with the new argument is a good idea.
Arguments for/against?
From | Date | Subject | |
---|---|---|---|
Next Message | Paul Smith | 2015-05-26 20:55:59 | Re: ERROR: MultiXactId xxxx has not been created yet -- apparent wraparound |
Previous Message | Andrew Dunstan | 2015-05-26 19:53:54 | Re: Run pgindent now? |