Misleading "epoll_create1 failed: Too many open files"

From: Andres Freund <andres(at)anarazel(dot)de>
To: pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Misleading "epoll_create1 failed: Too many open files"
Date: 2024-11-26 15:10:51
Message-ID: xjjx7r4xa7beixuu4qtkdhnwdbchrrpo3gaeb3jsbinvvdiat5@cwjw55mna5of
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

I ran something which triggered the error in $subject. Except that it turns
out that
a) epoll_create1() was not being called
b) we didn't actually hit EMFILE or even max_safe_fds

The reason for the failure is that we have:
if (!AcquireExternalFD())
{
/* treat this as though epoll_create1 itself returned EMFILE */
elog(ERROR, "epoll_create1 failed: %m");
}

and

bool
AcquireExternalFD(void)
{
/*
* We don't want more than max_safe_fds / 3 FDs to be consumed for
* "external" FDs.
*/
if (numExternalFDs < max_safe_fds / 3)
{
ReserveExternalFD();
return true;
}
errno = EMFILE;
return false;
}

I think it's rather confusing to claim that epoll_create1() failed when we
didn't even call it.

Why are we misattributing the failure to a system call that we didn't make?

The current behaviour was introduced in

commit 3d475515a15f70a4a3f36fbbba93db6877ff8346
Author: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Date: 2020-02-24 17:28:33 -0500

Account explicitly for long-lived FDs that are allocated outside fd.c.

I also wish we wouldn't report EMFILE when we didn't actually reach any hard
limit - that makes the system behaviour unnecessarily confusing. But that's
not quite so easy to fix.

How about making the error message something like
elog(ERROR, "AcquireExternalFD, for epoll_create1, failed: %m");

Greetings,

Andres Freund

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Japin Li 2024-11-26 15:14:06 Re: UUID v7
Previous Message Dmitry Nikitin 2024-11-26 15:10:41 Re: [PATCH] Missing Assert in the code