Re: Yet another way for pg_ctl stop to fail on Windows

From: Alexander Lakhin <exclusion(at)gmail(dot)com>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Yet another way for pg_ctl stop to fail on Windows
Date: 2024-09-08 15:00:00
Message-ID: da4f9ec1-a272-bd2d-4ca1-6eb58d6dc79a@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

07.09.2024 21:11, Noah Misch wrote:

>
>> Noah, what do you think of handling this error in line with handling of
>> ERROR_BROKEN_PIPE and ERROR_BAD_PIPE (which was done in 0ea1f2a3a)?
>>
>> I tried the following change:
>>         switch (GetLastError())
>>         {
>>                 case ERROR_BROKEN_PIPE:
>>                 case ERROR_BAD_PIPE:
>> +               case ERROR_PIPE_BUSY:
>> and saw no issues.
> That would be a strict improvement over returning EINVAL like we do today. We
> do use PIPE_UNLIMITED_INSTANCES, so I expect the causes of ERROR_PIPE_BUSY are
> process exit and ENOMEM-like situations. While that change is the best thing
> if the process is exiting, it could silently drop the signal in ENOMEM-like
> situations. Consider the following alternative. If sig==0, just return 0
> like you propose, because the process isn't completely gone. Otherwise, sleep
> and retry the signal, like pgwin32_open_handle() retries after certain errors.
> What do you think of that?

Thank you for your attention to the issue!

I agree with your approach. It looks like Microsoft recommends to loop on
ERROR_PIPE_BUSY: [1] (they say "Calling CallNamedPipe is equivalent to
calling the CreateFile ..." at [2]).

So if we aim to not only fix "pg_ctl stop", but to make pgkill() robust,
it's the way to go, IMHO. I'm not sure about an infinite loop they show,
I'd vote for a loop with the same characteristics as in
pgwin32_open_handle().

I've managed to trigger ERROR_PIPE_BUSY with "pg_ctl reload", when running
20 TAP tests (see attached) in parallel (with 20 threads executing
$node->reload() in each), and with the kill() call inside do_reload()
repeated x100 (see the modification patch attached too):
...
# Running: pg_ctl -D .../099_test_pgkill\data/t_099_test_pgkill_node_data/pgdata reload
### Reloading node "node"
# Running: pg_ctl -D .../099_test_pgkill\data/t_099_test_pgkill_node_data/pgdata reload
[13:41:46.850](2.400s) # 18
server signaled
server signaled
server signaled
server signaled
server signaled
server signaled
server signaled
server signaled
!!!pgkill| GetLastError(): 231
pg_ctl: could not send reload signal (PID: 3912), iteration: 81, res: -1, errno: 22: Invalid argument
server signaled
[13:41:52.594](5.744s) # 19
...

[1] https://learn.microsoft.com/en-us/windows/win32/ipc/named-pipe-client
[2] https://learn.microsoft.com/en-us/windows/win32/api/winbase/nf-winbase-callnamedpipea

Best regards,
Alexander

Attachment Content-Type Size
pgkill-debugging.patch text/x-patch 1.1 KB
099_test_pgkill.pl application/x-perl 1.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jim Jones 2024-09-08 15:02:27 Re: [PoC] Add CANONICAL option to xmlserialize
Previous Message Oliver Ford 2024-09-08 13:56:47 Re: [PoC] Add CANONICAL option to xmlserialize