Re: too-may-open-files log file entries when vauuming under solaris

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Raschick, Hartmut" <Hartmut(dot)Raschick(at)keymile(dot)com>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: too-may-open-files log file entries when vauuming under solaris
Date: 2014-03-05 20:16:50
Message-ID: 18754.1394050610@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

"Raschick, Hartmut" <Hartmut(dot)Raschick(at)keymile(dot)com> writes:
> recently we have seen a lot of occurrences of "out of file descriptors:
> Too many open files; release and retry" in our postgres log files, every
> night when a "vacuum full analyze" is run. After some digging into the
> code we found that postgres potentially tries to open as many as a
> pre-determined maximum number of file descriptors when vacuuming. That
> number is the lesser of the one from the configuration file
> (max_files_per_process) and the one determined at start-up by
> "src/backend/storage/file/fd.c::count_usable_fds()". Under Solaris now,
> it would seem, finding out that number via dup(0) is not sufficient, as
> the actual number of interest might be/is the number of usable stream
> file descriptors (up until Solaris 10, at least). Also, closing the last
> recently used file descriptor might therefore not solve a temporary
> problem (as something below 256 is needed). Now, this can be fixed by
> setting/leaving the descriptor limit at 256 or changing the
> postgresql.conf setting accordingly. Still, the function for determining
> the max number is not working as intended under Solaris, it would
> appear. One might try using fopen() instead of dup() or have a different
> handling for stream and normal file descriptors (including moving
> standard file descriptors to above 255 to leave room for stream
> ones). Maybe though, all this is not worth the effort; then it might
> perhaps be a good idea to mention the limitations/specialties in the
> platform specific notes (e.g. have u/limit at 256 maximum).

TBH this sounds like unfounded speculation. AFAIK a Postgres backend will
not open anything but regular files after its initial startup. I'm not
sure what a "stream" is on Solaris, but guessing that it refers to pipes
or sockets, I don't think we have a problem with an OS restriction that
those be below FD 256. In any case, if we did, it would presumably show
up as errors not release-and-retry events.

Our usual experience is that you get release-and-retry log messages when
the OS is up against the system-wide open-file limit rather than the
per-process limit (ie, the underlying error code is ENFILE not EMFILE).
I don't know exactly how Solaris strerror() spells those codes so it's
difficult to tell from your reported log message which case is happening.
If it is the system-wide limit that's at issue, then of course the dup(0)
loop isn't likely to find it, and adjusting max_files_per_process (or
maybe better, reducing max_connections) is the expected solution.

regards, tom lane

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Adrian Klaver 2014-03-05 22:00:00 Re: Mysterious DB reset
Previous Message Brent Wood 2014-03-05 19:26:14 Re: Mysterious DB reset