Re: BUG #18675: Postgres is not realasing memory causing OOM

From: Maciej Jaros <eccenux(at)gmail(dot)com>
To: Tomas Vondra <tomas(at)vondra(dot)me>
Cc: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #18675: Postgres is not realasing memory causing OOM
Date: 2024-10-29 12:26:09
Message-ID: CAOn6LZqgcnXFAdkoxRUTnH5B0g_JN0EmCNERvRREpT0_ZCvBog@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Thanks, Tomas. That helped me understand the problem and the comments a bit
more. To answer some questions about our setup and possible causes of the
issues:

- We are not using any non-standard extensions. We use PL/pgSQL in some
maintenance scripts, but that extension is built-in, so I guess you didn't
mean that.
- This is mostly a default configuration (aside from memory and CPU
adjustments). More specifically all of the JIT options are in their default
states (none are set).
- For context about connections, we have 10 app servers (Tomcat, JDK 17,
Hibernate 5) connecting to the database in question, each with around 20
active connections in practice (pooling, though some schedules might add a
bit). There is also a data warehouse with pooling and should not exceed 20
connections. So, in practice, we have around 250 connections, not the 500
we have in max_connections setting. Also most of the connections are
idle most of the time. So at least our max_connections is quite
conservative, I think.
- We terminate all queries running longer than 30 minutes. Typical
queries are below 1 second, and Java responses are mostly limited to 20
seconds. Additionally, most queries have a limit of 25 (25 items per page).
- The application is in use from 8 am to 6-7 pm, and it is mostly idle
at night. There is some maintenance at night (including vacuum and
vacuumlo). RAM availability stays flat at night, and I would expect it to
drop at some point around 7-8 pm. RAM usage on separate app servers does
drop after hours.

So, yes, that RAM usage is strange, and that’s why I reported it. It
doesn’t seem like this is a problem unique to us. I found questions about
memory usage on Stack Overflow, like this one, for example: How to limit the
memory available for PostgreSQL server
<https://stackoverflow.com/questions/28844170/how-to-limit-the-memory-that-is-available-for-postgresql-server>.
There is a comment there that seems to describe what could be a bug (looks
like a bug to me). Maybe not a bug-bug, but definitely an unwanted behavior:

Note that even if postgres logically releases memory it has allocated, it
may not be returned to operating system depending on the malloc()/free()
implementation of your execution environment. That may result in multiple
PostgreSQL processes getting over the limit due use of hash aggregation as
described above and the memory is never released to OS even though
PostgreSQL isn't actually using it either. This happens because technically
malloc() may use brk() behind the scenes and releasing memory back to OS is
only possible only in some special cases.

So, that comment led me to suggest adding some kind of process. I called it
a garbage collector, but maybe David is right; maybe that’s not accurate.
Anyway, that process, in my view, could try to actually release memory to
the system to prevent the OOM killer from doing its bidding. Is that
possible? I don’t know, don't know inner workings of PG. I also don’t
understand why calling free would not release memory. I’m also not sure if
that description of malloc/free is accurate, but it does seem to align with
what I’m seeing.

pon., 28 paź 2024 o 20:26 Tomas Vondra <tomas(at)vondra(dot)me> napisał(a):

> On 10/28/24 19:07, Maciej Jaros wrote:
> > David G. Johnston (28.10.2024 14:42):
> >> On Monday, October 28, 2024, PG Bug reporting form
> >> <noreply(at)postgresql(dot)org> wrote:
> >>
> >> The following bug has been logged on the website:
> >>
> >> Bug reference: 18675
> >> Logged by: Maciej Jaros
> >> Email address: eccenux(at)gmail(dot)com
> >> PostgreSQL version: 16.4
> >> Operating system: Ubuntu 22.04
> >> Description:
> >>
> >>
> >> or maybe
> >> PostgreSQL should include garbage collection?
> >>
> >>
> >> Garbage collection is typically used in relation to a programming
> >> language feature to make writing applications in those languages
> >> easier. Applications themselves don’t really implement garbage
> >> collection. And C, the language PostgreSQL, is written in, doesn’t
> >> have garbage collection. To our knowledge, though, there are no
> >> significant memory leaks in supported versions.
> >>
> >>
> >> RAMforPG = shared_buffers + (temp_buffers + work_mem) *
> >> max_connections;
> >>
> >>
> >> The expression: work_mem * max_connections is incorrect. See the doc
> >> for work_mem for how it is used.
> >>
> >> There is so much more info needed to conclude there is a bug here -
> >> which there probably is not. Exploring the query and tuning the
> >> system is better discussed on the -general mailing list.
> >>
> >> David J.
> >>
> >
> > Could you share what would be the correct expression to calculate or at
> > least estimate max RAM usage then? I've checked and haven't found
> > anything in the docs. I've found that expression in user space. I know
> > autovac might need to be accounted for, but as said we are not using it.
> > How would this estimation of 20GB go to 50GB?
> >
>
> Unfortunately there's no universal formula, because it depends on what
> queries you run. For example a query that needs to do 10 sorts may need
> to use 10 x work_mem, and so on. Yes, this is unfortunate, we'd like to
> have a per-session memory limit, but we don't have that. So the only
> recommendation is to set these limits conservatively, not too close to
> the available memory limit.
>
> Also, if you really found a memory leak, these formulas are pointless. A
> memory leak is usually about "breaking" such limits, and we may not even
> know about all memory that gets allocated (for external libraries).
>
> > There just seem to be no limit in RAM usage so it does seem like a
> > memory leak. It just grows until there is no more RAM available an we
> > restart the service. There are same operations, same connections
> > (pooling on the Java side) and it just grows everyday. It seem to be a
> > memory leak.It doesn't seem to have an end.
> >
>
> The question is how you define a memory leak. All memory allocated by a
> query (using "our" infrastructure) is tied to a "memory context" and
> should be released at the end. It's possible for a query to allocate a
> lot of memory, perhaps even not release right away, but it should be
> released at the end of a query. I'm not going to rule out a bug that
> breaks this (e.g. by using a long-lived memory context), but it's very
> unlikely we'd not find that pretty soon.
>
> Also, the memory leak seems to be permanent - in your chart the memory
> usage grows over a week. Presumably your queries are shorter than that,
> so that's not consistent with this type of memory leak.
>
> What I think might be more likely is that you're using something that
> allocates memory by directly calling malloc() - say, an extension using
> some external library, etc. These allocations are completely outside our
> control, and if not freed explicitly, would be a "normal" memory leak.
>
> This is why people were asking about JIT earlier. JIT relies on llvm,
> which allocates memory directly, and so a bug in LLVM could leak memory
> like this. The first thing to do is to disable JIT, and see if the
> memory leak goes away. If it does, then we know it's either a bug in
> LLVM, or in how we use it (e.g. we may not trigger cleanup).
>
> But all this is just a wild guess. We don't even know if you're using
> some other extensions which might also leak memory, etc.
>
>
> regards
>
> --
> Tomas Vondra
>
>

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2024-10-29 16:13:52 Re: BUG #18675: Postgres is not realasing memory causing OOM
Previous Message David Rowley 2024-10-29 12:22:26 Re: BUG #18677: numeric values in arrays are stored incorrectly