Re: Draft for basic NUMA observability

From: Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>
To: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Draft for basic NUMA observability
Date: 2025-03-14 10:05:28
Message-ID: CAKZiRmyXPfHMZngT29LCbG2bvkemYBWcKpiqM9JhE4CaiTRjvw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Mar 13, 2025 at 3:15 PM Bertrand Drouvot
<bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:

Hi,

Thank you very much for the review! I'm answering to both reviews in
one go and the results is attached v12, seems it all should be solved
now:

> > > === 2
> > >
> > > +else
> > > + as_fn_error $? "header file <numa.h> is required for --with-libnuma" "$LINENO" 5
> > > +fi
> > >
> > > Maybe we should add the same test (checking for numa.h) for meson?
> >
> > TBH, I have no idea, libnuma.h may exist but it may not link e.g. when
> > cross-compiling 32-bit on 64-bit. Or is this more about keeping sync
> > between meson and autoconf?
>
> Yeah, idea was to have both in sync.

Added.

> > > === 8
> > >
> > > # create extension pg_buffercache;
> > > ERROR: could not load library "/home/postgres/postgresql/pg_installed/pg18/lib/pg_buffercache.so": /home/postgres/postgresql/pg_installed/pg18/lib/pg_buffercache.so: undefined symbol: pg_numa_query_pages
> > > CONTEXT: SQL statement "CREATE FUNCTION pg_buffercache_pages()
> > > RETURNS SETOF RECORD
> > > AS '$libdir/pg_buffercache', 'pg_buffercache_pages'
> > > LANGUAGE C PARALLEL SAFE"
> > > extension script file "pg_buffercache--1.2.sql", near line 7
> > >
> > > While that's ok if 0003 is applied. I think that each individual patch should
> > > "fully" work individually.
> >
> > STILL OPEN QUESTION: Not sure I understand: you need to have 0001 +
> > 0002 or 0001 + 0003, but here 0002 is complaining about lack of
> > pg_numa_query_pages() which is part of 0001 (?) Should I merge those
> > patches or keep them separate?
>
> Applying 0001 + 0002 only does produce the issue above. I think that each
> sub-patch once applied should pass make check-world, that means:
>
> 0001 : should pass
> 0001 + 0002 : should pass
> 0001 + 0002 + 0003 : should pass

OK, I've retested v11 for all three of them. It worked fine (I think
in v10 I've moved one function to 0001, but pg_numa_query_pages() as
per error message above was always in the 0001).

> > > Also I need to think more about how firstUseInBackend is used, for example:
> > >
> > > ==== 17.1
> > >
> > > would it be better to define it as a file scope variable? (at least that
> > > would avoid to pass it as an extra parameter in some functions).
> >
> > I have no strong opinion on this, but I have one doubt (for future):
> > isn't creating global variables making life harder for upcoming
> > multithreading guys ?
>
> That would be "just" one more. I think it's better to use it to avoid the "current"
> code using "useless" function parameters.

Done.

> > > === 23
> > >
> > > + if (s >= 0)
> > > + zones[s]++;
> > >
> > > should we also check that s is below a limit?
> >
> > STILL OPEN QUESTION: I'm not sure it would give us value to report
> > e.g. -2 on per shm entry/per numa node entry, would it? If it would we
> > could somehow overallocate that array and remember negative ones too.
>
> I meant to say, ensure that it is below the max node number.

Done, but I doubt the kernel would return a value higher than
numa_max_nodes(), but who knows. Additional defense for this array is
now there.

> SECOND REVIEW//v11-0001 review
>
> === 1
>
> git am produces:
>
> .git/rebase-apply/patch:378: new blank line at EOF.
> +
> .git/rebase-apply/patch:411: new blank line at EOF.
> +
> warning: 2 lines add whitespace errors.

Should be gone, but in at least one case (0003/numa.out) we need to
have empty EOF because otherwise expected tests don't pass (even if
numa.sql doesnt have EOF in numa.sql)

> === 2
>
> + Returns true if a <acronym>NUMA</acronym> support is available.
>
> What about "Returns true if the server has been compiled with NUMA support"?

Done.

> === 3
>
> +Datum
> +pg_numa_available(PG_FUNCTION_ARGS)
> +{
> + if(pg_numa_init() == -1)
> + PG_RETURN_BOOL(false);
> + PG_RETURN_BOOL(true);
> +}
>
> What about PG_RETURN_BOOL(pg_numa_init() != -1)?
>
> Also I wonder if pg_numa.c would not be a better place for it.

Both done.

> === 4
>
> +{ oid => '5102', descr => 'Is NUMA compilation available?',
> + proname => 'pg_numa_available', provolatile => 'v', prorettype => 'bool',
> + proargtypes => '', prosrc => 'pg_numa_available' },
> +
>
> Not sure that 5102 is a good choice.
>
> src/include/catalog/unused_oids is telling me:
>
> Best practice is to start with a random choice in the range 8000-9999.
> Suggested random unused OID: 9685 (23 consecutive OID(s) available starting here)
>
> So maybe use 9685 instead?

It's because i've got 5101 there earlier (for pg_shm_numa_allocs
view), but I've aligned both (5102(at)0001 and 5101(at)0003) to 968[56].

> === 5
>
> @@ -12477,3 +12481,4 @@
> prosrc => 'gist_stratnum_common' },
>
> ]
> +
>
> garbage?

Yea, fixed.

> === 6
>
> Run pgindent as it looks like it's finding some things to do in src/backend/storage/ipc/shmem.c
> and src/port/pg_numa.c.

Fixed.

> === 7
>
> +Size
> +pg_numa_get_pagesize(void)
> +{
> + Size os_page_size = sysconf(_SC_PAGESIZE);
> + if (huge_pages_status == HUGE_PAGES_ON)
> + GetHugePageSize(&os_page_size, NULL);
> + return os_page_size;
> +}
>
> I think that's more usual to:
>
> Size
> pg_numa_get_pagesize(void)
> {
> Size os_page_size = sysconf(_SC_PAGESIZE);
>
> if (huge_pages_status == HUGE_PAGES_ON)
> GetHugePageSize(&os_page_size, NULL);
>
> return os_page_size;
> }
>
> I think that makes sense to check huge_pages_status as you did.

Done.

> === 8
>
> +extern void numa_warn(int num, char *fmt,...) pg_attribute_printf(2, 3);
> +extern void numa_error(char *where);
>
> I wonder if that would make sense to add a comment mentioning
> github.com/numactl/numactl/blob/master/libnuma.c here.

Sure thing, added.

-J.

Attachment Content-Type Size
v12-0003-Add-pg_shmem_numa_allocations-to-show-NUMA-zones.patch application/octet-stream 15.9 KB
v12-0002-Extend-pg_buffercache-with-new-view-pg_buffercac.patch application/octet-stream 27.8 KB
v12-0001-Add-optional-dependency-to-libnuma-Linux-only-fo.patch application/octet-stream 18.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Yuya Watari 2025-03-14 10:09:38 Re: [PoC] Reducing planning time when tables have many partitions
Previous Message Thomas Munro 2025-03-14 09:03:15 Re: Some read stream improvements