Re: BackgroundPsql swallowing errors on windows

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org
Cc: Noah Misch <noah(at)leadboat(dot)com>
Subject: Re: BackgroundPsql swallowing errors on windows
Date: 2025-02-14 13:14:45
Message-ID: e5d8d598-bf23-4966-91fb-b4043e451d2c@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On 2025-02-13 Th 12:39 PM, Andres Freund wrote:
> Hi,
>
> One of the remaining tasks for AIO was to convert the tests to be proper tap
> tests. I did that and was thanked by the tests fairly randomly failing on
> windows. Which test fails changes from run to run.
>
> The symptom is that BackgroundPsql->query() sometimes would simply swallow
> errors that were clearly generated by the backend. Which then would cause
> tests to fail, because testing for errors was the whole point of $test.
>
>
> At first I thought the issue was that psql didn't actually flush stderr after
> displaying errors. And while that may be an issue, it doesn't seem to be the
> one causing problems for me.
>
> Lots of hair-pulling later, I have a somewhat confirmed theory for what's
> happening:
>
> BackgroundPsql::query() tries to detect if the passed in query has executed by
> adding a "banner" after the query and using pump_until() to wait until that
> banner has been "reached". That seems to work reasonably well on !windows.
>
> On windows however, it looks like there's no guarantee that if stdout has been
> received by IPC::Run, stderr also has been received, even if the stderr
> content has been generated first. I tried to add an extra ->pump_nb() call to
> query(), thinking that maybe IPC::Run just didn't get input that had actually
> arrived, due to waiting on just one pipe. But no success.
>
> My understanding is that IPC::Run uses a proxy process on windows to execute
> subprocesses and then communicates with that over TCP (or something along
> those lines). I suspect what's happening is that the communication with the
> external process allows for reordering between stdout/stderr.
>
> And indeed, changing BackgroundPsql::query() to emit the banner on both stdout
> and stderr and waiting on both seems to fix the issue.
>
>
> One complication is that I found that just waiting for the banner, without
> also its newline, sometimes lead to unexpected newlines causing later queries
> to fail. I think that happens if the trailing newline is read separately from
> the rest of the string.
>
> However, matching the newlines caused tests to fail on some machines. After a
> lot of cursing I figured out that for interactive psql we output \r\n, causing
> my regex match to fail. I.e. tests failed whenever IO::PTY was availble...
>
> It's also not correct, as we did before, to just look for the banner, it has
> to be anchored to either the start of the output or a newline, otherwise the
> \echo (or \warn) command itself will be matched by pump_until() (but then the
> replacing the command would fail). Not sure that could cause active problems
> without the addition of \warn (which is also echoed on stdout), but it
> certainly could after.
>
>
> The banner being the same between queries made it hard to understand if a
> banner that appeared in the output was from the current query or a past
> query. Therefore I added a counter to it.
>
>
> For debugging I added a "note" that shows stdout/stderr after executing the
> query, I think it may be worth keeping that, but I'm not sure.
>
>
> This was a rather painful exercise.
>
>

It's been discussed before, but I'd really really like to get rid of
BackgroundPsql. It's ugly, non-intuitive and fragile.

Last year I did some work on this. I was going to present it at Athens
but illness prevented me, and then other life events managed to get in
my way. But the basic work is around. See
<https://github.com/adunstan/test-pq/commit/98518e4929e80fb96f210bbc5aab9fdcea058512>
This introduces a libpq session object (PostgreSQL::Test::Session) which
can be backed either by FFI or a small XS wrapper - the commit has
recipes for both. Missing is a meson.build file for the XS wrapper.
There are significant performance gains to be had too (poll_query_until
is much nicer, for example, as are most uses of safe_psql). If there is 
interest I will bring the work up to date, and maybe we can introduce
this early in the v19 cycle. It would significantly reduce our
dependency on IPC::Run, especially the pump() stuff.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ilia Evdokimov 2025-02-14 13:17:09 Re: Sample rate added to pg_stat_statements
Previous Message Daniel Gustafsson 2025-02-14 12:35:40 Re: BackgroundPsql swallowing errors on windows