Re: Identify huge pages accessibility using madvise

From: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>
To: Gabriele Bartolini <gabriele(dot)bartolini(at)enterprisedb(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Identify huge pages accessibility using madvise
Date: 2024-11-08 08:54:09
Message-ID: 5qlqcyxxn34a4ljs6a65azcilec7753hvv6aesdneyqghim5wf@wtvlypkumsdj
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Thu, Sep 26, 2024 at 08:46:17AM GMT, Dmitry Dolgov wrote:
> > On Thu, Sep 26, 2024 at 07:57:12AM GMT, Gabriele Bartolini wrote:
> > Hi Dmitry,
> >
> > I've been attempting to replicate this issue directly in Kubernetes, but I
> > haven't been successful so far. I've been using EKS nodes, and it seems
> > that they all run cgroup v2 now. Do you have anything that could help me
> > get started on this more quickly?
>
> Thanks for testing. I can check if I can get some EKS clusters to
> experiment with. In the meantime, what about the reproducing script for
> cgroup v2 (the plain one that I've attached with the patch, that doesn't
> require any k8s cluster), doesn't it work for you?

Looks like there is a plot twist. After talking to Gabriele off list and
testing on an EKS, I've discovered that since 5.7 Linux kernel supports
hugetlb reservation via hugetlbfs [1]. That means that together with the
original limitation at page fault time there is one at reservation time,
which has a separate knob in cgroupfs:

# cgroup v2, hugetlb controller
#
# original limit, page fault level
hugetlb.2MB.limit_in_bytes
#
# new one, reservation level
hugetlb.2MB.rsvd.limit_in_bytes

This means that there still could be people facing the original issue patch is
trying to address: for that one needs to either run older kernel, or have a
container orchestration tool that do not set rsvd value (looks like there are
such examples). But in the long term perspective I would expect everyone
converging to use reservation limits correctly, so maybe the patch is not
needed after all.

[1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=cdc2fcfea79b9873bb63159f8ed973f4046018c8

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Zhijie Hou (Fujitsu) 2024-11-08 09:08:55 RE: Commit Timestamp and LSN Inversion issue
Previous Message Karina Litskevich 2024-11-08 08:14:13 Re: Add missing tab completion for ALTER TABLE ADD COLUMN IF NOT EXISTS