Re: Draft for basic NUMA observability

From: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>
To: Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Draft for basic NUMA observability
Date: 2025-03-17 16:11:27
Message-ID: Z9hJrzKUdMwk0HQP@ip-10-97-1-34.eu-west-3.compute.internal
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Mon, Mar 17, 2025 at 08:28:46AM +0100, Jakub Wartak wrote:
> On Fri, Mar 14, 2025 at 1:08 PM Bertrand Drouvot
> <bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
> >
> > I did prepare a patch file (attached as .txt to not disturb the cfbot) to apply
> > on top of v11 0002 (I just rebased it a bit so that it now applies on top of
> > v12 0002).
>
> Hey Bertrand,
>
> all LGTM (good ideas), so here's v13 attached with applied all of that
> (rebased, tested).

Thanks for v13!

Looking at 0003:

=== 1

+ <entry>NUMA mappings for shared memory allocations</entry>

s/NUMA mappings/NUMA node mappings/ maybe?

=== 2

+ <para>
+ The <structname>pg_shmem_numa_allocations</structname> view shows NUMA nodes
+ assigned allocations made from the server's main shared memory segment.

What about?

"
shows how shared memory allocations in the server's main shared memory segment
are distributed across NUMA nodes" ?

=== 3

+ <structfield>numa_zone_id</structfield> <type>int4</type>

s/numa_zone_id/zone_id? to be consistent with pg_buffercache_numa introduced in
0002.

BTW, I wonder if "node_id" would be better (to match the descriptions...).
If so, would also need to be done in 0002.

=== 4

+ ID of NUMA node

<acronym>NUMA</acronym> node ID? (to be consistent with 0002).

=== 5

+static bool firstUseInBackend = true;

Let's use firstNumaTouch to be consistent with 0002.

=== 6

+ elog(NOTICE, "libnuma initialization failed or NUMA is not supported on this platform, some NUMA data might be unavailable.");;

There is 2 ";" + I think that we should used the same wording as in
pg_buffercache_numa_pages().

=== 7

What about using ERROR instead? (like in pg_buffercache_numa_pages())

=== 8

+ /*
+ * This is for gathering some NUMA statistics. We might be using various
+ * DB block sizes (4kB, 8kB , .. 32kB) that end up being allocated in
+ * various different OS memory pages sizes, so first we need to understand
+ * the OS memory page size before calling move_pages()
+ */
+ os_page_size = pg_numa_get_pagesize();

Maybe use the same comment as the one in pg_buffercache_numa_pages() before calling
pg_numa_get_pagesize()?

=== 9

+ max_zones = pg_numa_get_max_node();

I think we are mixing "zone" and "node". I think we should standardize on one
and use it everywhere (code and doc for both 0002 and 0003). I'm tempted to
vote for node, but zone is fine too if you prefer.

=== 10

+ /*
+ * Preallocate memory all at once without going into details which shared
+ * memory segment is the biggest (technically min s_b can be as low as
+ * 16xBLCKSZ)
+ */

What about?

"
Allocate memory for page pointers and status based on total shared memory size.
This simplified approach allocates enough space for all pages in shared memory
rather than calculating the exact requirements for each segment.
" instead?

=== 11

+ int shm_total_page_count,
+ shm_ent_page_count,

I think those 2 should be uint64.

=== 12

+ /*
+ * XXX: We are ignoring in NUMA version reporting of the following regions
+ * (compare to pg_get_shmem_allocations() case): 1. output shared memory
+ * allocated but not counted via the shmem index 2. output as-of-yet
+ * unused shared memory
+ */

why XXX?

what about?

"
We are ignoring the following memory regions (as compared to
pg_get_shmem_allocations())....

=== 13

+ page_ptrs = palloc(sizeof(void *) * shm_total_page_count);
+ memset(page_ptrs, 0, sizeof(void *) * shm_total_page_count);

maybe we could use palloc0() here?

=== 14

and I realize that we could probably use it in 0002 for os_page_ptrs.

=== 15

I think there is still some multi-lines comments that are missing a period. I
probably also missed some in 0002 during the previous review. I think that's
worth another check.

=== 16

+ * In order to get reliable results we also need to touch memory
+ * pages so that inquiry about NUMA zone doesn't return -2.
+ */

maybe use the same wording as in 0002?

=== 17

The logic in 0003 looks ok to me. I don't like the 2 loops on shm_ent_page_count
but (as for 0002) it looks like we can not avoid it (or at least I don't see
a way to avoid it).

I'll still review the whole set of patches 0001, 0002 and 0003 once 0003 is
updated.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2025-03-17 16:48:29 Re: maintenance_work_mem = 64kB doesn't work for vacuum
Previous Message Robert Haas 2025-03-17 16:05:22 Re: pgsql: Avoid invalidating all RelationSyncCache entries on publication