Re: Trim the heap free memory

From: Tomas Vondra <tomas(at)vondra(dot)me>
To: shawn wang <shawn(dot)wang(dot)pg(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: David Rowley <dgrowleyml(at)gmail(dot)com>, Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Trim the heap free memory
Date: 2024-12-08 18:48:38
Message-ID: 15c14d9a-3575-41d9-9875-dee5d4118006@vondra.me
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12/8/24 05:23, Tomas Vondra wrote:
> On 9/18/24 04:56, shawn wang wrote:
>> Thank you very much for your response and suggestions.
>>
>> As you mentioned, the patch here is actually designed for glibc's
>> ptmalloc2 andis not applicable to other platforms. I will consider
>> supporting it only on the Linux platform in the future. In the memory
>> management strategy of ptmalloc2, there is a certain amount of non-
>> garbage-collected memory, which is closely related to the order and
>> method of memory allocation and release. To reduce the performance
>> overhead caused by frequent allocation and release of small blocks of
>> memory, ptmalloc2 intentionally retains this part of the memory. The
>> malloc_trim function locks, traverses memory blocks, and uses madvise to
>> release this part of the memory, but this process may also have a
>> negative impact on performance. In the process of exploring solutions, I
>> also considered a variety of strategies, including scheduling
>> malloc_trim to be executed at regular intervals or triggering
>> malloc_trim after a specific number of free operations. However, we
>> found that these methods are not optimal solutions.
>>
>> We can see that out of about 43K test queries, 32K saved nothing
>> whatever, and in only four was more than a couple of meg saved.
>> That's pretty discouraging IMO.  It might be useful to look closer
>> at the behavior of those top four though.  I see them as
>>
>>
>> I have previously encountered situations where the non-garbage-collected
>> memory of wal_sender was approximately hundreds of megabytes or even
>> exceeded 1GB, but I was unable to reproduce this situation using simple
>> SQL. Therefore, I introduced an asynchronous processing function, hoping
>> to manage memory more efficiently without affecting performance.
>>  
>
> I doubt a system function is the right approach to deal with these
> memory allocation issues. The function has to be called by the user,
> which means the user is expected to monitor the system and decide when
> to invoke the function. That seems far from trivial - it would require
> collecting OS-level information about memory usage, and I suppose it'd
> need to happen fairly often to actually help with OOM reliably.
>
>>
>> In addition, I have considered the following optimization strategies:
>>
>> 1.
>>
>> Adjust the configuration of ptmalloc2 through the mallopt function
>> to use mmap rather than sbrk for memory allocation. This can
>> immediately return the memory to the operating system when it is
>> released, but it may affect performance due to the higher overhead
>> of mmap.
>>
>
> Sure, forcing the system to release memory more aggressively may affect
> performance - that's the tradeoff done by glibc. But calling the new
> pg_trim_backend_heap_free_memory() function is not free either.
>
> But why would it force returning the memory to be returned immediately?
> The decision whether to trim memory is driven by M_TRIM_THRESHOLD, and
> that does not need to be 0. In fact, it's 128kB by default, i.e. glibc
> trims memory automatically, if it can trim at least 128kB.
>
> Yes, by default the thresholds are adjusted dynamically, which I guess
> is one way to get excessive memory usage that could have been solved by
> calling malloc_trim(). But setting the option to any value disabled the
> dynamic behavior, it doesn't need to be set to 0.
>
>
>> 2.
>>
>> Use other memory allocators such as jemalloc or tcmalloc, and adjust
>> relevant parameters to reduce the generation of non-garbage-
>> collected memory. However, these allocators are designed for multi-
>> threaded and may lead to increased memory usage per process.
>>
>
> Right, that's kinda the opposite of trying to not waste memory.
>
> But it also suggests syscalls (done by malloc) may be a problem under
> high concurrency - not just with multi-threading, but even with regular
> processes. And for glibc that matters too, of course - in fact, it may
> be pretty important to allow glibc to cache more memory (by increasing
> M_TOP_PAD) to get good throughput in certain workloads ...
>
>> 3.
>>
>> Build a set of memory context (memory context) allocation functions
>> based on mmap, delegating the responsibility of memory management
>> entirely to the database level. Although this solution can
>> effectively control memory allocation, it requires a large-scale
>> engineering implementation.
>>
>
> Why would it be complex? You could just as well set M_MMAP_THRESHOLD to
> some low value, so that all malloc() calls are handled by mmap()
> internally. Not sure it's a good idea, though.
>
>> I look forward to further discussing these solutions with you and
>> exploring the best memory management practices together.
>>
>
> Adjusting the glibc malloc() behavior may be important, but I don't
> think a system function is a good approach. It's possible to change the
> behavior by setting environment variables, which is pretty easy, but
> maybe we could have some thing that does the same thing using mallopt().
>
> That's what Ronan Dunklau proposed in thread [1] a year ago ... I like
> that approach much more, it's much simpler for the user.
>

To propose something less abstract / more tangible, I think we should do
something like this:

1) add a bit of code for glibc-based systems, that adjusts selected
malloc parameters using mallopt() during startup

2) add a GUC that enables this, with the default being the regular glibc
behavior (with dynamic adjustment of various thresholds)

Which exact parameters would this set is an open question, but based on
my earlier experiments, Ronan's earlier patches, etc. I think it should
adjust at least

M_TRIM_THRESHOLD - to make sure we trim heap regularly
M_TOP_PAD - to make sure we cache some allocated memory

I wonder if maybe we should tune M_MMAP_THRESHOLD, which on 64-bit
systems defaults to 32MB, so we don't really mmap() very often for
regular memory contexts. But I don't know if that's a good idea, that
would need some experiments.

I believe that's essentially what Ronan Dunklau proposed, but it
stalled. Not because of some inherent complexity, but because of
concerns about introducing glibc-specific code.

Based on my recent experiments I think it's clearly worth it (esp. with
high concurrency workloads). If glibc was a niche, it'd be a different
situation, but I'd guess vast majority of databases runs on glibc. Yes,
it's possible to do these changes without new code (e.g. by setting the
environment variables), but that's rather inconvenient.

Perhaps it'd be possible to make it a bit smarter by looking at malloc
stats, and adjust the trim/pad thresholds, but I'd leave that for the
future. It might even lead to similar issues with excessive memory usage
just like the logic built into glibc.

But maybe we could at least print / provide some debugging information?
That would help with adjusting the GUC ...

regards

--
Tomas Vondra

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2024-12-08 19:25:53 Re: Assert failure on running a completed portal again
Previous Message Pavel Stehule 2024-12-08 18:32:40 Re: proposal: schema variables