Re: IPC::Run accepts bug reports

From: Noah Misch <noah(at)leadboat(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: IPC::Run accepts bug reports
Date: 2024-06-19 21:53:54
Message-ID: 20240619215354.20@rfd.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jun 18, 2024 at 08:07:27PM -0700, Andres Freund wrote:
> > > > 1) Sometimes hangs hard on windows if started processes have not been shut
> > > > down before script exits.

> It reliably reproduces if I comment out
> the lines below
> # explicitly shut down psql instances gracefully - to avoid hangs
> # or worse on windows
> in 021_row_visibility.pl
>
> The logfile ends in
> Warning: unable to close filehandle GEN25 properly: Bad file descriptor during global destruction.
> Warning: unable to close filehandle GEN20 properly: Bad file descriptor during global destruction.
>
>
> Even if I cancel the test, I can't rerun it because due to a leftover psql
> a) a new temp install can't be made (could be solved by rm -rf)
> b) the test's logfile can't be removed (couldn't even rename the directory)
>
> The psql instance needs to be found and terminated first.

Thanks for that recipe. I've put that in my queue to fix.

On Tue, Jun 18, 2024 at 12:00:13PM -0700, Andres Freund wrote:
> On 2024-06-18 10:10:17 -0700, Noah Misch wrote:
> > On Mon, Jun 17, 2024 at 11:11:17AM -0700, Andres Freund wrote:
> > > 2) If a subprocess dies in an inopportune moment, IPC::Run dies with "ack
> > > Broken pipe:" (in _do_filters()). There's plenty reports of this on the
> > > list, and I've hit this several times personally. It seems to be timing
> > > dependent, I've encountered it after seemingly irrelevant ordering changes.
> > >
> > > I suspect I could create a reproducer with a bit of time.
> >
> > I've seen that one. If the harness has data to write to a child, the child
> > exiting before the write is one way to reach that. Perhaps before exec(),
> > IPC::Run should do a non-blocking write from each pending IO. That way, small
> > writes never experience the timing-dependent behavior.
>
> I think the question is rather, why is ipc run choosing to die in this
> situation and can that be fixed?

With default signal handling, the process would die to SIGPIPE. Since
PostgreSQL::Test ignores SIGPIPE, this happens instead. The IPC::Run source
tree has no discussion of ignoring SIGPIPE, so I bet it didn't get a conscious
decision. Perhaps it can do better.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2024-06-19 22:54:46 Re: suspicious valgrind reports about radixtree/tidstore on arm64
Previous Message Joseph Koshakow 2024-06-19 21:44:00 Re: Remove dependence on integer wrapping