Re: pg_upgrade test failure

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: pg_upgrade test failure
Date: 2022-10-17 20:47:37
Message-ID: CA+hUKGKsdzw06c5nnb=KYG9GmvyykoVpJA_VR3k0r7cZOKcx6Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-hackers

On Mon, Oct 3, 2022 at 7:29 PM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
> On Mon, Oct 03, 2022 at 04:03:12PM +1300, Thomas Munro wrote:
> > So I think that setting is_lnk = false is good enough here. Do
> > you see a hole in it?
>
> I cannot think on one, on top of my head. Thanks for the
> explanation.

Some things I learned while trying to understand how I managed to
introduce that bug despite reading and testing:

* the code in pgunlink() has comments saying that its retry loop is to
handle sharing violations
* in fact that retry loop also comes into play for STATUS_DELETE_PENDING
* that case is fairly well hidden, because to reach it you need to
unlink(pathname) twice! the second call will wait up to 10 seconds
for handles to close and then report ENOENT, allowing rmdir(parent) to
succeed
* I guess this code is relying on that double-unlink to block until
the directory is empty?
* you wouldn't notice any of this if you were testing on Windows 10 on
a desktop/laptop/VM, because it now uses POSIX semantics for unlink on
NTFS, so the first unlink truly (synchronously) unlinks (no more
STATUS_DELETE_PENDING)
* Server 2019, as used on CI, still uses the traditional NT semantics
(unlink is asynchronous, when all handles closes)
* the fix I proposed has the right effect (I will follow up with tests
to demonstrate)

I'll post my tests for this and a bunch more things I figured out
shortly in a new Windows-filesystem-semantics-omnibus thread.

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Michael Paquier 2022-10-18 01:23:00 pgsql: Rename SetSingleFuncCall() to InitMaterializedSRF()
Previous Message Bruce Momjian 2022-10-17 19:21:57 pgsql: doc: move the mention of aggregate JSON functions up in section

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2022-10-17 21:14:21 Re: problems with making relfilenodes 56-bits
Previous Message Andres Freund 2022-10-17 20:30:02 Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock