Re: Refactoring postmaster's code to cleanup after child exit

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Refactoring postmaster's code to cleanup after child exit
Date: 2024-10-05 11:45:46
Message-ID: 217d43af-0287-4769-a825-cde4cfa00e6c@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 05/10/2024 01:03, Thomas Munro wrote:
> On Sat, Oct 5, 2024 at 7:41 AM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>> My test for dead-end backends opens 20 TCP (or unix domain) connections
>> to the server, in quick succession. That works fine my system, and it
>> passed cirrus CI on other platforms, but on FreeBSD it failed
>> repeatedly. The behavior in that scenario is apparently
>> platform-dependent: it depends on the accept queue size, but what
>> happens when you reach the queue size also seems to depend on the
>> platform. On my Linux system, the connect() calls in the client are
>> blocked, if the server is doesn't call accept() fast enough, but
>> apparently you get an error on *BSD systems.
>
> Right, we've analysed that difference in AF_UNIX implementation
> before[1], which shows up in the real world, where client sockets ie
> libpq's are usually non-blocking, as EAGAIN on Linux (which is not
> valid per POSIX) vs ECONNREFUSED on other OSes. All fail to connect,
> but the error message is different.

Thanks for the pointer!

> For blocking AF_UNIX client sockets like in your test, Linux
> effectively has an infinite queue made from two layers. The listen
> queue (a queue of connecting sockets) does respect the requested
> backlog size, but when it's full it has an extra trick: the connect()
> call waits (in a queue of threads) for space to become free in the
> listen queue, so it's effectively unlimited (but only for blocking
> sockets), while FreeBSD and I suspect any other implementation
> deriving from or reimplementing the BSD socket code gives you
> ECONNREFUSED. macOS behaves just the same as FreeBSD AFAICT, so I
> don't know why you didn't see the same thing... I guess it's just
> racing against accept() draining the queue.

In fact I misremembered: the failure happened on macOS, *not* FreeBSD.
It could be just luck I didn't see it on FreeBSD though.

> It's possible that Windows copied the Linux behaviour for AF_UNIX,
> given that it probably has something to do with the WSL project for
> emulating Linux, but IDK.

Sadly Windows' IO::Socket::UNIX hasn't been implemented on Windows (or
at least on this perl distribution we're using in Cirrus CI):

Socket::pack_sockaddr_un not implemented on this architecture at
C:/strawberry/5.26.3.1/perl/lib/Socket.pm line 872.

so I'll have to disable this test on Windows anyway.

--
Heikki Linnakangas
Neon (https://neon.tech)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Shayon Mukherjee 2024-10-05 13:13:26 Re: On disable_cost
Previous Message vignesh C 2024-10-05 11:43:29 Re: New PostgreSQL Contributors