From: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
---|---|
To: | "Boyer, Maxime (he/him | il/lui)" <Maxime(dot)Boyer(at)cra-arc(dot)gc(dot)ca> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org> |
Subject: | Re: BUG #18168: Parallel worker failed to initialize: could not create inherited socket: error code 10106 |
Date: | 2023-10-25 21:23:03 |
Message-ID: | CA+hUKGJQrzNrXn1us_sYC9Djh9p7AQ1uPHWAjWhLzhX5YV-35w@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
On Thu, Oct 26, 2023 at 3:44 AM Boyer, Maxime (he/him | il/lui)
<Maxime(dot)Boyer(at)cra-arc(dot)gc(dot)ca> wrote:
> > FWIW, the PG code that throws that error message is old enough to vote;
> > it's not something we changed in a recent minor release.
>
> Yeah, that's what I thought :'D
>
> > I am guessing you saw the impact of some external event, but I don't know what.
>
> Fair enough. This happened the day after reverting to 11, because of the memory error on 14, but I also doubt it's related. I was stopping one of the application node at the time. Maybe a Windows thing, or something related to the firmware updates.
Re-bonjour Maxime,
FWIW that comes from WSASocket() trying to inherit/duplicate a socket
used for communication with the pgstat process (a process and a socket
that don't exist in PostgreSQL 15, where that mechanism was replaced
with a new shared memory system; but given you were trying to upgrade
to 14 you probably don't want to hear about 15 today...).
I have no idea why that would happen, but for the record the manual[1] says:
"WSAEPROVIDERFAILEDINIT
10106
Service provider failed to initialize. The requested service provider
could not be loaded or initialized. This error is returned if either a
service provider's DLL could not be loaded (LoadLibrary failed) or the
provider's WSPStartup or NSPStartup function failed."
That seems pretty low level. If this were PostgreSQL's fault I
suppose it would have to come from corruption of the WSAPROTOCOL_INFO
struct (a sort of cookie we need to duplicate the socket), but I doubt
it. I see there were a few reports years ago about this error message
from pre-parallel-query times. It's interesting that you see this
specifically with parallel workers (which inherits only a pgstat
socket, not with the client connection socket. The pgstat socket is
different in that it is a UDP socket. I wonder if there is something
special about UDP that is upsetting your network stack, perhaps a
firewall thing somewhere that is upset specifically by some limit on
UDP activity or something. But I'm not a Windows guy so I have no
real clue.
[1] https://learn.microsoft.com/en-us/windows/win32/winsock/windows-sockets-error-codes-2
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Langote | 2023-10-26 02:27:00 | Re: AW: AW: BUG #18147: ERROR: invalid perminfoindex 0 in RTE with relid xxxxx |
Previous Message | Bruce Momjian | 2023-10-25 18:36:50 | Re: missing requirement on ccache in postgresql16-devel |