Re: Bump soft open file limit (RLIMIT_NOFILE) to hard limit on startup

From: Tomas Vondra <tomas(at)vondra(dot)me>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jelte Fennema-Nio <postgres(at)jeltef(dot)nl>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Bump soft open file limit (RLIMIT_NOFILE) to hard limit on startup
Date: 2025-02-11 22:33:45
Message-ID: a4c0388f-02f8-4e5a-9638-616aabf3f9e3@vondra.me
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2/11/25 21:18, Tom Lane wrote:
> Tomas Vondra <tomas(at)vondra(dot)me> writes:
>> I did run into bottlenecks due to "too few file descriptors" during a
>> recent experiments with partitioning, which made it pretty trivial to
>> get into a situation when we start trashing the VfdCache. I have a
>> half-written draft of a blog post about that somewhere.
>
>> But my conclusion was that it's damn difficult to even realize that's
>> happening, especially if you don't have access to the OS / perf, etc.
>
> Yeah. fd.c does its level best to keep going even with only a few FDs
> available, and it's hard to tell that you have a performance problem
> arising from that. (Although I recall old war stories about Postgres
> continuing to chug along just fine after it'd run the kernel out of
> FDs, although every other service on the system was crashing left and
> right, making it difficult e.g. even to log in. That scenario is why
> I'm resistant to pushing our allowed number of FDs to the moon...)
>
>> So
>> my takeaway was we should improve that first, so that people have a
>> chance to realize they have this issue, and can do the tuning. The
>> improvements I thought about were:
>
>> - track hits/misses for the VfdCache (and add a system view for that)
>
> I think what we actually would like to know is how often we have to
> close an open FD in order to make room to open a different file.
> Maybe that's the same thing you mean by "cache miss", but it doesn't
> seem like quite the right terminology. Anyway, +1 for adding some way
> to discover how often that's happening.
>

We can count the evictions (i.e. closing a file so that we can open a
new one) too, but AFAICS that's about the same as counting "misses"
(opening a file after not finding it in the cache). After the cache
warms up, those counts should be about the same, I think.

Or am I missing something?

>> - maybe have wait event for opening/closing file descriptors
>
> Not clear that that helps, at least for this specific issue.
>

I don't think Jelte described any specific issue, but the symptoms I've
observed were that a query was accessing a table with ~1000 relations
(partitions + indexes), trashing the vfd cache, getting ~0% cache hits.
And the open/close calls were taking a lot of time (~25% CPU time).
That'd be very visible as a wait event, I believe.

>> - show max_safe_fds value somewhere, not just max_files_per_process
>> (which we may silently override and use a lower value)
>
> Maybe we should just assign max_safe_fds back to max_files_per_process
> after running set_max_safe_fds? The existence of two variables is a
> bit confusing anyhow. I vaguely recall that we had a reason for
> keeping them separate, but I can't think of the reasoning now.
>

That might work. I don't know what were the reasons for not doing that,
I suppose there were reasons not to do that.

regards

--
Tomas Vondra

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2025-02-11 22:41:51 Re: describe special values in GUC descriptions more consistently
Previous Message Peter Smith 2025-02-11 22:31:44 Re: describe special values in GUC descriptions more consistently