Is it OK to ignore directory open failure in ResetUnloggedRelations?

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Is it OK to ignore directory open failure in ResetUnloggedRelations?
Date: 2017-12-04 20:15:08
Message-ID: 21040.1512418508@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

While working through Michael Paquier's patch to clean up inconsistent
usage of AllocateDir(), I noticed that ResetUnloggedRelations and its
subroutines are not consistent about whether a directory open failure
results in erroring out or just emitting a LOG message and continuing.
ResetUnloggedRelations itself throws a hard error if it fails to open
pg_tblspc, but all the rest of reinit.c thinks a LOG message is
sufficient.

My first thought was to change ResetUnloggedRelations to match the
rest, but on reflection I'm less sure about that. What we've got
at the moment is that a possibly-transient directory open failure
can result in failure to reset an unlogged relation to empty,
which to me amounts to data corruption. If the contents of the
unlogged relation are inconsistent, which is plenty likely after
a crash, we could end up crashing later because of that; and in
any case the user would not see what they expect in the tables.

So now I'm thinking we should do the reverse and change these functions
to give a hard error on AllocateDir failure. That would result in
startup-process failure if we are unable to scan the database, which is
not great, but there's certainly something badly wrong if we can't.

Thoughts?

regards, tom lane

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2017-12-04 20:26:08 pgsql: When VACUUM or ANALYZE skips a concurrently dropped table, log i
Previous Message Alvaro Herrera 2017-12-04 19:11:34 Re: Errands around AllocateDir()