From: | Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com> |
---|---|
To: | Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de> |
Subject: | Re: Draft for basic NUMA observability |
Date: | 2025-02-17 12:02:04 |
Message-ID: | CAKZiRmzgaN-vZeoDjSHCbavU7dDyBLa1Vyp4sW=WQaZ4R43mvw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Feb 13, 2025 at 4:28 PM Bertrand Drouvot
<bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
Hi Bertrand,
Thanks for playing with this!
> Which makes me wonder if using numa_move_pages()/move_pages is the right approach. Would be curious to know if you observe the same behavior though.
You are correct, I'm observing identical behaviour, please see attached.
> Forcing the allocation to happen inside a monitoring function is decidedly not great.
We probably would need to split it to some separate and new view
within the pg_buffercache extension, but that is going to be slow, yet
still provide valid results. In the previous approach that
get_mempolicy() was allocating on 1st access, but it was slow not only
because it was allocating but also because it was just 1 syscall per
1x addr (yikes!). I somehow struggle to imagine how e.g. scanning
(really allocating) a 128GB buffer cache in future won't cause issues
- that's like 16-17mln (* 2) syscalls to be issued when not using
move_pages(2)
Another thing is that numa_maps(5) won't help us a lot too (not enough
granularity).
> But maybe we could use get_mempolicy() only on "valid" buffers i.e ((buf_state & BM_VALID) && (buf_state & BM_TAG_VALID)), thoughts?
Different perspective: I wanted to use the same approach in the new
pg_shmemallocations_numa, but that won't cut it there. The other idea
that came to my mind is to issue move_pages() from the backend that
has already used all of those pages. That literally mean on of the
below ideas:
1. from somewhere like checkpointer / bgwriter?
2. add touching memory on backend startup like always (sic!)
3. or just attempt to read/touch memory addr just before calling
move_pages(). E.g. this last options is just two lines:
if(os_page_ptrs[blk2page+j] == 0) {
+ volatile uint64 touch pg_attribute_unused();
os_page_ptrs[blk2page+j] = (char *)BufHdrGetBlock(bufHdr) +
(os_page_size*j);
+ touch = *(uint64 *)os_page_ptrs[blk2page+j];
}
and it seems to work while still issuing much less syscalls with
move_pages() across backends, well at least here.
Frankly speaking I do not know which path to take with this, maybe
that's good enough?
-J.
Attachment | Content-Type | Size |
---|---|---|
numa_test.txt | text/plain | 1.5 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Daniel Gustafsson | 2025-02-17 12:03:36 | Re: [PoC] Federated Authn/z with OAUTHBEARER |
Previous Message | Shlok Kyal | 2025-02-17 11:34:26 | Re: Restrict copying of invalidated replication slots |