From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Subject: | Misleading "epoll_create1 failed: Too many open files" |
Date: | 2024-11-26 15:10:51 |
Message-ID: | xjjx7r4xa7beixuu4qtkdhnwdbchrrpo3gaeb3jsbinvvdiat5@cwjw55mna5of |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
I ran something which triggered the error in $subject. Except that it turns
out that
a) epoll_create1() was not being called
b) we didn't actually hit EMFILE or even max_safe_fds
The reason for the failure is that we have:
if (!AcquireExternalFD())
{
/* treat this as though epoll_create1 itself returned EMFILE */
elog(ERROR, "epoll_create1 failed: %m");
}
and
bool
AcquireExternalFD(void)
{
/*
* We don't want more than max_safe_fds / 3 FDs to be consumed for
* "external" FDs.
*/
if (numExternalFDs < max_safe_fds / 3)
{
ReserveExternalFD();
return true;
}
errno = EMFILE;
return false;
}
I think it's rather confusing to claim that epoll_create1() failed when we
didn't even call it.
Why are we misattributing the failure to a system call that we didn't make?
The current behaviour was introduced in
commit 3d475515a15f70a4a3f36fbbba93db6877ff8346
Author: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Date: 2020-02-24 17:28:33 -0500
Account explicitly for long-lived FDs that are allocated outside fd.c.
I also wish we wouldn't report EMFILE when we didn't actually reach any hard
limit - that makes the system behaviour unnecessarily confusing. But that's
not quite so easy to fix.
How about making the error message something like
elog(ERROR, "AcquireExternalFD, for epoll_create1, failed: %m");
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Japin Li | 2024-11-26 15:14:06 | Re: UUID v7 |
Previous Message | Dmitry Nikitin | 2024-11-26 15:10:41 | Re: [PATCH] Missing Assert in the code |