From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Cc: | Noah Misch <noah(at)leadboat(dot)com> |
Subject: | BackgroundPsql swallowing errors on windows |
Date: | 2025-02-13 17:39:04 |
Message-ID: | wmovm6xcbwh7twdtymxuboaoarbvwj2haasd3sikzlb3dkgz76@n45rzycluzft |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
One of the remaining tasks for AIO was to convert the tests to be proper tap
tests. I did that and was thanked by the tests fairly randomly failing on
windows. Which test fails changes from run to run.
The symptom is that BackgroundPsql->query() sometimes would simply swallow
errors that were clearly generated by the backend. Which then would cause
tests to fail, because testing for errors was the whole point of $test.
At first I thought the issue was that psql didn't actually flush stderr after
displaying errors. And while that may be an issue, it doesn't seem to be the
one causing problems for me.
Lots of hair-pulling later, I have a somewhat confirmed theory for what's
happening:
BackgroundPsql::query() tries to detect if the passed in query has executed by
adding a "banner" after the query and using pump_until() to wait until that
banner has been "reached". That seems to work reasonably well on !windows.
On windows however, it looks like there's no guarantee that if stdout has been
received by IPC::Run, stderr also has been received, even if the stderr
content has been generated first. I tried to add an extra ->pump_nb() call to
query(), thinking that maybe IPC::Run just didn't get input that had actually
arrived, due to waiting on just one pipe. But no success.
My understanding is that IPC::Run uses a proxy process on windows to execute
subprocesses and then communicates with that over TCP (or something along
those lines). I suspect what's happening is that the communication with the
external process allows for reordering between stdout/stderr.
And indeed, changing BackgroundPsql::query() to emit the banner on both stdout
and stderr and waiting on both seems to fix the issue.
One complication is that I found that just waiting for the banner, without
also its newline, sometimes lead to unexpected newlines causing later queries
to fail. I think that happens if the trailing newline is read separately from
the rest of the string.
However, matching the newlines caused tests to fail on some machines. After a
lot of cursing I figured out that for interactive psql we output \r\n, causing
my regex match to fail. I.e. tests failed whenever IO::PTY was availble...
It's also not correct, as we did before, to just look for the banner, it has
to be anchored to either the start of the output or a newline, otherwise the
\echo (or \warn) command itself will be matched by pump_until() (but then the
replacing the command would fail). Not sure that could cause active problems
without the addition of \warn (which is also echoed on stdout), but it
certainly could after.
The banner being the same between queries made it hard to understand if a
banner that appeared in the output was from the current query or a past
query. Therefore I added a counter to it.
For debugging I added a "note" that shows stdout/stderr after executing the
query, I think it may be worth keeping that, but I'm not sure.
This was a rather painful exercise.
Greetings,
Andres Freund
Attachment | Content-Type | Size |
---|---|---|
v1-0001-BackgroundPsql-Fix-potential-for-lost-errors-on-w.patch | text/x-diff | 3.5 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Степан | 2025-02-13 17:45:05 | Patch: Log parameter types in detailed query logging |
Previous Message | Andrew Atkinson | 2025-02-13 17:13:45 | DROP CONSTRAINT, printing a notice and/or skipping when no action is taken |