Re: Weird failure with latches in curculio on v15

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Nathan Bossart <nathandbossart(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Fujii Masao <fujii(at)postgresql(dot)org>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Weird failure with latches in curculio on v15
Date: 2023-02-21 00:32:10
Message-ID: CA+hUKGL6Gfk302Fywus72J4zLbXcFiC3PTCjGigsHpLKNAfAjA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Feb 21, 2023 at 1:03 PM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
> Perhaps beginning a new thread with a patch and a summary would be
> better at this stage? Another thing I am wondering is if it could be
> possible to test that rather reliably. I have been playing with a few
> scenarios like holding the system() call for a bit with hardcoded
> sleep()s, without much success. I'll try harder on that part.. It's
> been mentioned as well that we could just move away from system() in
> the long-term.

I've been experimenting with ideas for master, which I'll start a new
thread about. Actually I was already thinking about this before this
broken signal handler stuff came up, because it was already
unacceptable that all these places that are connected to shared memory
ignore interrupts for unbounded time while a shell script/whatever
runs. At first I thought it would be relatively simple to replace
system() with something that has a latch wait loop (though there are
details to argue about, like whether you want to allow interrupts that
throw, and if so, how you clean up the subprocess, which have several
plausible solutions). But once I started looking at the related
popen-based stuff where you want to communicate with the subprocess
(for example COPY FROM PROGRAM), I realised that it needs more
analysis and work: that stuff is currently entirely based on stdio
FILE (that is, fread() and fwrite()), but it's not really possible (at
least portably) to make that nonblocking, and in fact it's a pretty
terrible interface in terms of error reporting in general. I've been
sketching/thinking about a new module called 'subprocess', with a
couple of ways to start processes, and interact with them via
WaitEventSet and direct pipe I/O; or if buffering is needed, it'd be
our own, not <stdio.h>'s. But don't let me stop anyone else proposing
ideas.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Davis 2023-02-21 00:38:18 Re: Rework of collation code, extensibility
Previous Message Michael Paquier 2023-02-21 00:03:27 Re: Weird failure with latches in curculio on v15