Re: BUG #14929: Unchecked AllocateDir() return value in restoreTwoPhaseData()

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PanBian <bianpan2016(at)163(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, PostgreSQL mailing lists <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #14929: Unchecked AllocateDir() return value in restoreTwoPhaseData()
Date: 2017-11-28 00:35:46
Message-ID: CAB7nPqS5ua9ktEX7p_2WzFcTVCzo0zP9xAx4A7thPwV-8kq7bA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Tue, Nov 28, 2017 at 12:08 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> PanBian <bianpan2016(at)163(dot)com> writes:
>> On Mon, Nov 27, 2017 at 07:53:30PM +0900, Michael Paquier wrote:
>>> You are missing the fact that ReadDir goes through ReadDirExtended,
>>> which drops an ERROR log if the folder allocated is NULL.
>
>> You are right. Its my carelessness. ReadDir will not return back on a
>> NULL dir parameter. The code is bug free. Sorry for the trouble.
>
> There are some issues here, though:
>
> 1. twophase.c does this:
>
> cldir = AllocateDir(TWOPHASE_DIR);
> LWLockAcquire(TwoPhaseStateLock, LW_EXCLUSIVE);
> while ((clde = ReadDir(cldir, TWOPHASE_DIR)) != NULL)
> {
>
> which is flat out wrong because LWLockAcquire might well clobber errno.
> I don't see any good reason why we couldn't just swap the order of
> those two calls.

I have not checked if it actually updates errno or not, but relying on
the fact that it may do it sucks.

> 2. basebackup.c and some other places do things like
>
> dir = AllocateDir("pg_wal");
> if (!dir)
> ereport(ERROR,
> (errmsg("could not open directory \"%s\": %m", "pg_wal")));
> while ((de = ReadDir(dir, "pg_wal")) != NULL)
>
> Not only is this a waste of code, because the error message is no better
> than what ReadDir would provide, but it's wrong because it omits
> errcode_for_file_access(), causing the SQLSTATE to be reported as XX000.
> There are other places that are even lazier and use elog(), failing
> translatability as well as the errcode test.

I agree with using ereport() everywhere, a path may have been created
by initdb, but anything used after a base backup should be reported to
the user.

> There might be some other problems I missed in a quick scan.
>
> So there's definitely room for a cleanup patch here, but the originally
> proposed change isn't it.

I have spotted more problems. In pg_available_extensions,
AllocateDir() does nothing in the event of an error but forgets to
reset errno. In perform_base_backup, CheckXLogRemoved() is called
without saving errno, so the error message generated after may be
wrong.

I think that this requires its own new thread with a more extended
analysis on -hackers to attract attention, this goes way beyond the
original complain about a pointer dereference. And there is a
collection of small issues. I'll try to look at that after I am done
with my CF duties except if someone beats me or volunteers for it.
--
Michael

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Amit Langote 2017-11-28 00:52:17 Re: BUG #14928: Unchecked SearchSysCacheCopy1() return value
Previous Message Tom Lane 2017-11-28 00:25:49 Re: BUG #14928: Unchecked SearchSysCacheCopy1() return value