Re: BUG #17828: postgres_fdw leaks file descriptors on error and aborts aborted transaction in lack of fds

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #17828: postgres_fdw leaks file descriptors on error and aborts aborted transaction in lack of fds
Date: 2024-02-08 21:59:20
Message-ID: 20240208215920.myib264liwbkdndp@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi,

On 2024-02-08 12:04:24 -0500, Tom Lane wrote:
> Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com> writes:
> > IIUC I think the assertion failure was caused by an
> > error-during-error-recovery loop caused by the "epoll_create1 failed:
> > Too many open files" error raised in WaitLatchOrSocket called from
> > pgfdw_get_cleanup_result, which is called during abort cleanup. I
> > think a simple fix to avoid such a loop is to modify the PG_CATCH
> > block in pgfdw_get_cleanup_result so that it just ignores the passed
> > error, not re-throwing it, and restores InterruptHoldoffCount and the
> > memory context, like the attached. In the patch I also modified
> > callers of pgfdw_get_cleanup_result to issue a warning when ignoring
> > the error. I might be missing something, though.
>
> I do not think ignoring the passed error is *ever* acceptable.
> You have no idea what the error condition is or whether your
> hack is sufficient to recover from it.

+1

I think we ought to understand *why* we are getting the "Too many open
files". The AcquireExternalFD() in CreateWaitEventSet() should prevent
that.

One annoying bit is that AcquireExternalFD() failing emits the same error as
if epoll_create1() itself failing, including the same errno. So we don't know
if the problem is that there are too many connections and because of that we
are running out of "file descriptor slots", or whether something was using up
file descriptors outside of our system, or ...

I also wonder if postgres_fdw should strive to use a longer lived wait event
set. For efficiency, if nothing else? That'd avoid the need to create one
during error handling.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2024-02-08 22:06:46 Re: BUG #17828: postgres_fdw leaks file descriptors on error and aborts aborted transaction in lack of fds
Previous Message Tom Lane 2024-02-08 17:04:24 Re: BUG #17828: postgres_fdw leaks file descriptors on error and aborts aborted transaction in lack of fds