Yet another way for pg_ctl stop to fail on Windows

From: Alexander Lakhin <exclusion(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Noah Misch <noah(at)leadboat(dot)com>
Subject: Yet another way for pg_ctl stop to fail on Windows
Date: 2024-09-07 12:00:00
Message-ID: ba498e06-d22a-5bfb-389b-031f8db62d7d@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello hackers,

While trying to reproduce a recent fairywren (a Windows animal) failure,
I ran amcheck/amcheck/003_cic_2pc in parallel inside a slowed-down
VM and came across another issue:
### Stopping node "CIC_2PC_test" using mode fast
# Running: pg_ctl -D C:\src\postgresql\build/testrun/amcheck_17/003_cic_2pc\data/t_003_cic_2pc_CIC_2PC_test_data/pgdata
-m fast stop
waiting for server to shut down..... failed
pg_ctl: server does not shut down
# pg_ctl stop failed: 256
# Postmaster PID for node "CIC_2PC_test" is 6048
[08:24:52.915](12.792s) Bail out!  pg_ctl stop failed

So "pg_ctl stop" failed due to not a timeout, but some other reason.

With extra logging added, I got:
### Stopping node "CIC_2PC_test" using mode fast
# Running: pg_ctl -D C:\src\postgresql\build/testrun/amcheck_3/003_cic_2pc\data/t_003_cic_2pc_CIC_2PC_test_data/pgdata
-m fast stop
waiting for server to shut down......!!!pgkill| GetLastError(): 231
postmaster (9596) died untimely? res: -1, errno: 22
 failed

Thus, CallNamedPipe() in pgkill() returned ERROR_PIPE_BUSY (All pipe
instances are busy) and it was handled as an unexpected error.
(The error code 231 returned 10 times out of 10 failures of this ilk for
me.)

Noah, what do you think of handling this error in line with handling of
ERROR_BROKEN_PIPE and ERROR_BAD_PIPE (which was done in 0ea1f2a3a)?

I tried the following change:
        switch (GetLastError())
        {
                case ERROR_BROKEN_PIPE:
                case ERROR_BAD_PIPE:
+               case ERROR_PIPE_BUSY:
and saw no issues.

The reason I'd like to bring your attention to the issue (if you don't
mind), is that it's impossible to understand the reason of such false
failure if it happens in the buildfarm/CI.

Best regards,
Alexander

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Marcos Pegoraro 2024-09-07 14:16:20 Re: Detailed release notes
Previous Message Michael Paquier 2024-09-07 11:44:20 Re: Create syscaches for pg_extension