Re: Draft for basic NUMA observability

From: Tomas Vondra <tomas(at)vondra(dot)me>
To: Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Draft for basic NUMA observability
Date: 2025-04-01 20:17:16
Message-ID: c7ce0a21-18dd-4682-bcd1-f9424bb7e2b9@vondra.me
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

I've spent a bit of time reviewing this. In general I haven't found
anything I'd call a bug, but here's a couple comments for v18 ... Most
of this is in separate "review" commits, with a couple exceptions.

1) Please update the commit messages, with proper formatting, etc. I
tried to do that in the attached v19, but please go through that, add
relevant details, update list of reviewers, etc. The subject should not
be overly long, etc.

2) I don't think we need "libnuma for NUMA awareness" in configure, I'd
use just "libnuma support" similar to other libraries.

3) I don't think we need pg_numa.h to have this:

extern PGDLLIMPORT Datum pg_numa_available(PG_FUNCTION_ARGS);

AFAICS we don't have any SQL functions exposed as PGDLLIMPORT, so why
would it be necessary here? It's enough to have a prototype in .c file.

4) Improved .sgml to have acronym/productname in a couple places.

5) I don't think the comment for pg_buffercache_init_entries() is very
useful. That it's helper for pg_buffercache_pages() tells me nothing
about how to use it, what the arguments are, etc.

6) IMHO pg_buffercache_numa_prepare_ptrs() would deserve a better
comment too. I mean, there's no info about what the arguments are, which
arguments are input or output, etc. And it only discussed one option
(block page < memory page), but what happens in the other case? The
formulas with blk2page/blk2pageoff are not quite clear to me (I'm not
saying it's wrong).

However, it seems rather suspicious that pages_per_blk is calculated as
float, and pg_buffercache_numa_prepare_ptrs() then does this:

for (size_t j = 0; j < pages_per_blk; j++)
{ ... }

I mean, isn't this vulnerable to rounding errors, which might trigger
some weird behavior? If not, it'd be good to add a comment why this is
fine, it confuses me a lot. I personally would probably prefer doing
just integer arithmetic here.

7) This para in the docs seems self-contradictory:

<para>
The <function>pg_buffercache_numa_pages()</function> provides the same
information
as <function>pg_buffercache_pages()</function> but is slower because
it also
provides the <acronym>NUMA</acronym> node ID per shared buffer entry.
The <structname>pg_buffercache_numa</structname> view wraps the
function for
convenient use.
</para>

I mean, "provides the same information, but is slower because it
provides different information" is strange. I understand the point, but
maybe rephrase somehow?

8) Why is pg_numa_available() marked as volatile? Should not change in a
running cluster, no?

9) I noticed the new SGML docs have IDs with mixed "-" and "_". Maybe
let's not do that.

<sect2 id="pgbuffercache-pg-buffercache_numa">

10) I think it'd be good to mention/explain why move_pages is used
instead of get_mempolicy - more efficient with batching, etc. This would
be useful both in the commit message and before the move_pages call (and
in general to explain why pg_buffercache_numa_prepare_ptrs prepares the
pointers like this etc.).

11) This could use UINT64_FORMAT, instead of a cast:

elog(DEBUG1, "NUMA: os_page_count=%lu os_page_size=%zu
pages_per_blk=%.2f",
(unsigned long) os_page_count, os_page_size, pages_per_blk);

regards

--
Tomas Vondra

Attachment Content-Type Size
v19-0001-Add-support-for-basic-NUMA-awareness.patch text/x-patch 21.8 KB
v19-0002-pgindent.patch text/x-patch 711 bytes
v19-0003-review.patch text/x-patch 3.9 KB
v19-0004-pg_buffercache-split-pg_buffercache_pages-into-p.patch text/x-patch 12.9 KB
v19-0005-review.patch text/x-patch 1.6 KB
v19-0006-Add-pg_buffercache_numa-view-with-NUMA-node-info.patch text/x-patch 19.1 KB
v19-0007-review.patch text/x-patch 1.0 KB
v19-0008-Add-pg_shmem_numa_allocations-to-show-NUMA-node.patch text/x-patch 16.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2025-04-01 20:22:07 Re: Adding support for SSLKEYLOGFILE in the frontend
Previous Message Andres Freund 2025-04-01 20:17:05 Re: TEMP_CONFIG vs test_aio