From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Tomas Vondra <tomas(at)vondra(dot)me> |
Cc: | Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Draft for basic NUMA observability |
Date: | 2025-04-07 15:51:29 |
Message-ID: | y4zhgypa4vt3txf22yzvkfe2m4rgrph25ms6ax2ukduwcl43u3@dosysiprwsha |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2025-04-06 13:56:54 +0200, Tomas Vondra wrote:
> On 4/6/25 01:00, Andres Freund wrote:
> > On 2025-04-05 18:29:22 -0400, Andres Freund wrote:
> >> I think one thing that the docs should mention is that calling the numa
> >> functions/views will force the pages to be allocated, even if they're
> >> currently unused.
> >>
> >> Newly started server, with s_b of 32GB an 2MB huge pages:
> >>
> >> grep ^Huge /proc/meminfo
> >> HugePages_Total: 34802
> >> HugePages_Free: 34448
> >> HugePages_Rsvd: 16437
> >> HugePages_Surp: 0
> >> Hugepagesize: 2048 kB
> >> Hugetlb: 76517376 kB
> >>
> >> run
> >> SELECT node_id, sum(size) FROM pg_shmem_allocations_numa GROUP BY node_id;
> >>
> >> Now the pages that previously were marked as reserved are actually allocated:
> >>
> >> grep ^Huge /proc/meminfo
> >> HugePages_Total: 34802
> >> HugePages_Free: 18012
> >> HugePages_Rsvd: 1
> >> HugePages_Surp: 0
> >> Hugepagesize: 2048 kB
> >> Hugetlb: 76517376 kB
> >>
> >>
> >> I don't see how we can avoid that right now, but at the very least we ought to
> >> document it.
> >
> > The only allocation where that really matters is shared_buffers. I wonder if
> > we could special case the logic for that, by only probing if at least one of
> > the buffers in the range is valid.
> >
> > Then we could treat a page status of -ENOENT as "page is not mapped" and
> > display NULL for the node_id?
> >
> > Of course that would mean that we'd always need to
> > pg_numa_touch_mem_if_required(), not just the first time round, because we
> > previously might not have for a page that is now valid. But compared to the
> > cost of actually allocating pages, the cost for that seems small.
> >
>
> I don't think this would be a good trade off. The buffers already have a
> NUMA node, and users would be interested in that.
The thing is that the buffer might *NOT* have a numa node. That's e.g. the
case in the above example - otherwise we wouldn't initially have seen the
large HugePages_Rsvd.
Forcing all those pages to be allocated via pg_numa_touch_mem_if_required()
itself wouldn't be too bad - in fact I'd rather like to have an explicit way
of doing that. The problem is that that leads to all those allocations to
happen on the *current* numa node (unless you have started postgres with
numactl --interleave=all or such), rather than the node where the normal first
use woul have allocated it.
> It's just that we don't have the memory mapped in the current backend, so
> I'd bet people would not be happy with NULL, and would proceed to force the
> allocation in some other way (say, a large query of some sort). Which
> obviously causes a lot of other problems.
I don't think that really would be the case with what I proposed? If any
buffer in the region were valid, we would force the allocation to become known
to the current backend.
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2025-04-07 15:59:48 | Re: Logging which local address was connected to in log_line_prefix |
Previous Message | Tom Lane | 2025-04-07 15:48:24 | Re: [PoC] Reducing planning time when tables have many partitions |