Re: pg_upgrade test failure

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: pg_upgrade test failure
Date: 2022-10-03 03:03:12
Message-ID: CA+hUKG+bk4KGwYT4JrZEdAgNdFm-c8ezTrxf-PacS-npQ_HdSw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-hackers

On Mon, Oct 3, 2022 at 1:40 PM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
> On Mon, Oct 03, 2022 at 12:10:06PM +1300, Thomas Munro wrote:
> > I think something like the attached should do the right thing for
> > STATUS_DELETE_PENDING (sort of "ENOENT-in-progress"). unlink() goes
> > back to being blocking (sleep+retry until eventually we reach ENOENT
> > or we time out and give up with EACCES), but we still distinguish it
> > from true ENOENT so we have a fast exit in that case. This is passing
> > CI, but not tested yet.
>
> if (lstat(path, &st) < 0)
> - return -1;
> + {
> + if (lstat_error_was_status_delete_pending())
> + is_lnk = false;
> + else
> + return -1;
> + }
> + else
> + is_lnk = S_ISLNK(st.st_mode);

> Sorry, I don't remember all the details in this area, but a directory
> can never be marked as STATUS_DELETE_PENDING with some of its contents
> still inside, right?

That's my understanding, yes: just like Unix, you can't remove a
directory with something in it. Unlike Unix, that includes files that
have been unlinked but are still open somewhere. (Note that in this
case it's not exactly a real directory, it's a junction point, which
is a directory but it doesn't have contents, it is a reparse point
pointing somewhere else, so I suspect that it can't really suffer from
ENOTEMPTY, but it probably can suffer from 'someone has it open for a
short time because they are concurrently stat-ing it'.)

> If it has some contents, forcing unlink() all
> the time would be fine?

Here's why I think it's probably OK to use unlink() unconditionally
after detecting STATUS_DELETE_PENDING. AFAICT there is no way to even
find out if it's a file or a junction in this state, but it doesn't
matter: we are not waiting for rmdir() or unlink() to succeed, we are
waiting for it to fail with an error other than EACCES, most likely
ENOENT (or to time out, perhaps because someone held the file open for
11 seconds, or because EACCES was actually telling us about a
permissions problem). EACCES is the errno for many things including
STATUS_DELETE_PENDING and also "you called unlink() but it's a
directory" (should be EPERM according to POSIX, or EISDIR according
to Linux). Both of those reasons imply that the zombie directory
entry still exists, and we don't care which of those reasons triggered
it. So I think that setting is_lnk = false is good enough here. Do
you see a hole in it?

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Michael Paquier 2022-10-03 06:28:51 Re: pg_upgrade test failure
Previous Message Michael Paquier 2022-10-03 00:40:20 Re: pg_upgrade test failure

Browse pgsql-hackers by date

  From Date Subject
Next Message Po-Chuan Hsieh 2022-10-03 03:55:32 [PATCH] Fix build with LLVM 15 or above
Previous Message Masahiko Sawada 2022-10-03 02:59:13 Re: [PATCH] Log details for client certificate failures