Re: fsync-pgdata-on-recovery tries to write to more files than previously

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Abhijit Menon-Sen <ams(at)2ndQuadrant(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Christoph Berg <myon(at)debian(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: fsync-pgdata-on-recovery tries to write to more files than previously
Date: 2015-05-28 00:22:18
Message-ID: 15791.1432772538@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Abhijit Menon-Sen <ams(at)2ndQuadrant(dot)com> writes:
> At 2015-05-27 11:46:39 +0530, ams(at)2ndQuadrant(dot)com wrote:
>> I'm trying a couple of approaches to that (e.g. using readdir directly
>> instead of ReadDir), but other suggestions are welcome.

> Here's what that looks like, but not yet fully tested.

I doubt that that (not using AllocateDir) is a good idea in the backend,
mainly because you are going to leak directory references if anything
called by walkdir() throws elog(ERROR). While that doesn't matter much
for the immediate usage, since we'd throw up our hands and die anyway,
it completely puts the kibosh on any idea that walkdir() is a
general-purpose subroutine. It would have to be decorated with big red
warnings NEVER USE THIS FUNCTION. Ick.

What I think is a reasonable compromise is to treat AllocateDir failure as
nonfatal, but to continue using ReadDir etc which means that any post-open
failure in reading a successfully-opened directory would be fatal. Such
errors would suggest that something is badly wrong in the filesystem, so
I do not feel too terrible about not covering that case.

Meanwhile, it seems like you have copied a couple of very dubious
decisions in initdb's walktblspc_links():

1. It doesn't say a word if the passed-in path (which in practice
is always $PGDATA/pg_tblspc) doesn't exist.

2. It assumes that every entry within pg_tblspc must be a tablespace
directory (or symlink to one), and fails if any are plain files.

At the very least, these behaviors seem inconsistent. #2 is making
strong assumptions about pg_tblspc being correctly set up and free
of user-added cruft, while #1 doesn't even assume it's there at all.

Now, we don't really have to do anything about these things in order
to fix the immediate problem, but I wonder if we shouldn't try to
clean up initdb's behavior while we're at it.

Independently of that, I thought the plan was to complain about any
problems at LOG message level, not DEBUG1, and definitely not WARNING
(which is lower than LOG for this purpose). And why would you use a
different message level for pg_xlog/ than for other files anyway?

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2015-05-28 00:45:45 Re: fsync-pgdata-on-recovery tries to write to more files than previously
Previous Message digoal zhou 2015-05-28 00:03:52 Re: Can we add syntax for references auto create index or not.