| From: | Alexander Lakhin <exclusion(at)gmail(dot)com> |
|---|---|
| To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Victor Spirin <v(dot)spirin(at)postgrespro(dot)ru> |
| Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
| Subject: | Re: Sometimes the output to the stdout in Windows disappears |
| Date: | 2020-10-16 16:00:00 |
| Message-ID: | ee02eaa2-03f7-74ea-bbdf-3196e506bae3@gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hello hackers,
13.09.2020 21:37, Tom Lane wrote:
> I happened to try googling for other similar reports, and I found
> a very interesting recent thread here:
>
> https://github.com/nodejs/node/issues/33166
>
> It might not have the same underlying cause, of course, but it sure
> sounds familiar. If Node.js are really seeing the same effect,
> that would point to an underlying Windows bug rather than anything
> Postgres is doing wrong.
>
> It doesn't look like the Node.js crew got any closer to
> understanding the issue than we have, unfortunately. They made
> their problem mostly go away by reverting a seemingly-unrelated
> patch. But I can't help thinking that it's a timing-related bug,
> and that patch was just unlucky enough to change the timing of
> their tests so that they saw the failure frequently.
I've managed to make a simple reproducer. Please look at the patch attached.
There are two things crucial for reproducing the bug:
ioctlsocket(sock, FIONBIO, &ioctlsocket_ret); // from pgwin32_socket()
and
WSACleanup();
I still can't understand what affects the effect. With this reproducer I
get:
vcregress taptest src\test\modules\connect
...
t/000_connect.pl .. # test
#
t/000_connect.pl .. 13346/100000
# Failed test at t/000_connect.pl line 24.
t/000_connect.pl .. 16714/100000
# Failed test at t/000_connect.pl line 24.
t/000_connect.pl .. 26216/100000
# Failed test at t/000_connect.pl line 24.
t/000_connect.pl .. 30077/100000
# Failed test at t/000_connect.pl line 24.
t/000_connect.pl .. 36505/100000
# Failed test at t/000_connect.pl line 24.
t/000_connect.pl .. 43647/100000
# Failed test at t/000_connect.pl line 24.
t/000_connect.pl .. 53070/100000
# Failed test at t/000_connect.pl line 24.
t/000_connect.pl .. 54402/100000
# Failed test at t/000_connect.pl line 24.
t/000_connect.pl .. 55685/100000
# Failed test at t/000_connect.pl line 24.
t/000_connect.pl .. 83193/100000
# Failed test at t/000_connect.pl line 24.
t/000_connect.pl .. 99992/100000 # Looks like you failed 10 tests of 100000.
t/000_connect.pl .. Dubious, test returned 10 (wstat 2560, 0xa00)
Failed 10/100000 subtests
But in our test farm the pg_bench test (from the installcheck-world
suite that we run with using msys) can fail roughly on each third run.
Perhaps it depends on I/O load. It seems, that searching files/scanning
disk in parallel increases the probability of the glitch.
I see no solution for this on the postgres side for now, but this
information about Windows quirks could be useful in case someone
stumbled upon it too.
Best regards,
Alexander
| Attachment | Content-Type | Size |
|---|---|---|
| catch-missing-stdout.patch | text/x-patch | 2.3 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Stephen Frost | 2020-10-16 16:00:29 | Re: [Patch] Using Windows groups for SSPI authentication |
| Previous Message | Tom Lane | 2020-10-16 15:47:21 | Re: Potential use of uninitialized context in pgcrypto |