Re: Draft for basic NUMA observability

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tomas Vondra <tomas(at)vondra(dot)me>
Cc: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Draft for basic NUMA observability
Date: 2025-04-07 16:42:21
Message-ID: cpzkjrlcgage2api6hushndya6i2yq7omjhga7tfp4ba3goyyb@53ot6clau7ij
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2025-04-07 18:36:24 +0200, Tomas Vondra wrote:
> > Forcing all those pages to be allocated via pg_numa_touch_mem_if_required()
> > itself wouldn't be too bad - in fact I'd rather like to have an explicit way
> > of doing that. The problem is that that leads to all those allocations to
> > happen on the *current* numa node (unless you have started postgres with
> > numactl --interleave=all or such), rather than the node where the normal first
> > use woul have allocated it.
> >
>
> I agree, forcing those allocations to happen on a single node seems
> rather unfortunate. But really, how likely is it that someone will run
> this function on a cluster that hasn't already allocated this memory?

I think it's not at all unlikely to have parts of shared buffers unused at the
start of a benchmark, e.g. because the table sizes grow over time.

> I'm not saying it can't happen, but we already have this issue if you
> start and do a warmup from a single connection ...

Indeed! We really need to fix this...

> >
> >> It's just that we don't have the memory mapped in the current backend, so
> >> I'd bet people would not be happy with NULL, and would proceed to force the
> >> allocation in some other way (say, a large query of some sort). Which
> >> obviously causes a lot of other problems.
> >
> > I don't think that really would be the case with what I proposed? If any
> > buffer in the region were valid, we would force the allocation to become known
> > to the current backend.
> >
>
> It's not quite clear to me what exactly are you proposing :-(
>
> I believe you're referring to this:
>
> > The only allocation where that really matters is shared_buffers. I wonder if
> > we could special case the logic for that, by only probing if at least one of
> > the buffers in the range is valid.
> >
> > Then we could treat a page status of -ENOENT as "page is not mapped" and
> > display NULL for the node_id?
> >
> > Of course that would mean that we'd always need to
> > pg_numa_touch_mem_if_required(), not just the first time round, because we
> > previously might not have for a page that is now valid. But compared to the
> > cost of actually allocating pages, the cost for that seems small.
>
> I suppose by "range" you mean buffers on a given memory page

Correct.

> and "valid" means BufferIsValid.

I was thinking of checking if the BufferDesc indicates BM_VALID or
BM_TAG_VALID.

BufferIsValid() just does a range check :(.

> Yeah, that probably means the memory page is allocated. But if the buffer is
> invalid, it does not mean the memory is not allocated, right? So does it
> make the buffer not interesting?

Well, you don't have contents in it it can't really affect performance. But
yea, I agree, it's not perfect either.

> I think we need to decide whether the current patches are good enough
> for PG18, with the current behavior, and then maybe improve that in
> PG19.

I think as long as the docs mention this with <note> or <warning> it's ok for
now.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2025-04-07 16:52:57 Re: [PoC] Federated Authn/z with OAUTHBEARER
Previous Message Jacob Champion 2025-04-07 16:41:25 Re: [PoC] Federated Authn/z with OAUTHBEARER