Re: Is it OK to ignore directory open failure in ResetUnloggedRelations?

From: David Steele <david(at)pgmasters(dot)net>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Is it OK to ignore directory open failure in ResetUnloggedRelations?
Date: 2017-12-05 04:37:09
Message-ID: 8cb076b9-7b41-69e0-dd54-f6e57ead4a97@pgmasters.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Tom,

On 12/4/17 3:15 PM, Tom Lane wrote:
> While working through Michael Paquier's patch to clean up inconsistent
> usage of AllocateDir(), I noticed that ResetUnloggedRelations and its
> subroutines are not consistent about whether a directory open failure
> results in erroring out or just emitting a LOG message and continuing.
> ResetUnloggedRelations itself throws a hard error if it fails to open
> pg_tblspc, but all the rest of reinit.c thinks a LOG message is
> sufficient.

By a strange coincidence I spent a while today reading through this code...

> My first thought was to change ResetUnloggedRelations to match the
> rest, but on reflection I'm less sure about that. What we've got
> at the moment is that a possibly-transient directory open failure
> can result in failure to reset an unlogged relation to empty,
> which to me amounts to data corruption.

I'm wondering how this transient directory open failure is going to
happen without a bunch of other things going wrong, but I agree that if
it happens then corruption would be the likely result.

> If the contents of the
> unlogged relation are inconsistent, which is plenty likely after
> a crash, we could end up crashing later because of that; and in
> any case the user would not see what they expect in the tables.

Agreed.

> So now I'm thinking we should do the reverse and change these functions
> to give a hard error on AllocateDir failure. That would result in
> startup-process failure if we are unable to scan the database, which is
> not great, but there's certainly something badly wrong if we can't.

+1. If a tablespace or database directory cannot be opened then I don't
think it makes any sense to continue.

Regards,
--
-David
david(at)pgmasters(dot)net

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2017-12-05 04:48:43 Mention ordered datums in PartitionBoundInfoData comment
Previous Message Craig Ringer 2017-12-05 03:59:34 Re: [HACKERS] Walsender timeouts and large transactions