Re: Add GUC to tune glibc's malloc implementation.

From: Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers(at)postgresql(dot)org, Peter Eisentraut <peter(at)eisentraut(dot)org>, tomas(dot)vondra(at)enterprisedb(dot)com
Subject: Re: Add GUC to tune glibc's malloc implementation.
Date: 2023-06-28 05:26:03
Message-ID: 1900350.taCxCBeP46@aivenlaptop
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Le mardi 27 juin 2023, 20:17:46 CEST Andres Freund a écrit :
> > Yes this is probably much more appropriate, but a much larger change with
> > greater risks of regression. Especially as we have to make sure we're not
> > overfitting our own code for a specific malloc implementation, to the
> > detriment of others.
>
> I think your approach is fundamentally overfitting our code to a specific
> malloc implementation, in a way that's not tunable by mere mortals. It just
> seems like a dead end to me.

I see it as a way to have *some* sort of control over the malloc
implementation we use, instead of tuning our allocations pattern on top of it
while treating it entirely as a black box. As for the tuning, I proposed
earlier to replace this parameter expressed in terms of size as a "profile"
(greedy / conservative) to make it easier to pick a sensible value.

>
> > Except if you hinted we should write our own directly instead ?
>
> I don't think we should write our own malloc - we don't rely on it much
> ourselves. And if we replace it, we need to care about mallocs performance
> characteristics a whole lot, because various libraries etc do heavily rely
> on it.
>
> However, I do think we should eventually avoid using malloc() for aset.c et
> al. malloc() is a general allocator, but at least for allocations below
> maxBlockSize aset.c's doesn't do allocations in a way that really benefit
> from that *at all*. It's not a lot of work to do such allocations on our
> own.
> > > We e.g. could keep a larger number of memory blocks reserved
> > > ourselves. Possibly by delaying the release of additionally held blocks
> > > until we have been idle for a few seconds or such.
> >
> > I think keeping work_mem around after it has been used a couple times make
> > sense. This is the memory a user is willing to dedicate to operations,
> > after all.
>
> The biggest overhead of returning pages to the kernel is that that triggers
> zeroing the data during the next allocation. Particularly on multi-node
> servers that's surprisingly slow. It's most commonly not the brk() or
> mmap() themselves that are the performance issue.
>
> Indeed, with your benchmark, I see that most of the time, on my dual Xeon
> Gold 5215 workstation, is spent zeroing newly allocated pages during page
> faults. That microarchitecture is worse at this than some others, but it's
> never free (or cache friendly).

I'm not sure I see the practical difference between those, but that's
interesting. Were you able to reproduce my results ?

> FWIW, in my experience trimming the brk()ed region doesn't work reliably
> enough in real world postgres workloads to be worth relying on (from a
> memory usage POV). Sooner or later you're going to have longer lived
> allocations placed that will prevent it from happening.

I'm not sure I follow: given our workload is clearly split at queries and
transactions boundaries, releasing memory at that time, I've assumed (and
noticed in practice, albeit not on a production system) that most memory at
the top of the heap would be trimmable as we don't keep much in between
queries / transactions.

>
> I have played around with telling aset.c that certain contexts are long
> lived and using mmap() for those, to make it more likely that the libc
> malloc/free can actually return memory to the system. I think that can be
> > quite worthwhile.

So if I understand your different suggestions, we should:
- use mmap ourselves for what we deem to be "one-off" allocations, to make
sure that memory is not hanging around after we don't use
- keep some pool allocated which will not be freed in between queries, but
reused for the next time we need it.

Thank you for looking at this problem.

Regards,

--
Ronan Dunklau

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2023-06-28 05:48:46 Re: pg_waldump: add test for coverage
Previous Message Nathan Bossart 2023-06-28 04:57:41 harmonize password reuse in vacuumdb, clusterdb, and reindexdb