From: | Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com> |
---|---|
To: | Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com> |
Cc: | Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Draft for basic NUMA observability |
Date: | 2025-03-10 10:14:07 |
Message-ID: | Z867b09EIZncOsmV@ip-10-97-1-34.eu-west-3.compute.internal |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On Fri, Mar 07, 2025 at 12:33:27PM +0100, Jakub Wartak wrote:
> On Fri, Mar 7, 2025 at 11:20 AM Jakub Wartak
> <jakub(dot)wartak(at)enterprisedb(dot)com> wrote:
> >
> > Hi,
> > On Wed, Mar 5, 2025 at 10:30 AM Jakub Wartak
> > <jakub(dot)wartak(at)enterprisedb(dot)com> wrote:
> > >Hi,
> >
> > > > Yeah, that's why I was mentioning to use a "shared" populate_buffercache_entry()
> > > > or such function: to put the "duplicated" code in it and then use this
> > > > shared function in pg_buffercache_pages() and in the new numa related one.
> > >
> > > OK, so hastily attempted that in 7b , I had to do a larger refactor
> > > there to avoid code duplication between those two. I don't know which
> > > attempt is better though (7 vs 7b)..
> > >
> >
> > I'm attaching basically the earlier stuff (v7b) as v8 with the
> > following minor changes:
> > - docs are included
> > - changed int8 to int4 in one function definition for numa_zone_id
>
> .. and v9 attached because cfbot partially complained about
> .cirrus.tasks.yml being adjusted recently (it seems master is hot
> these days).
>
Thanks for the new version!
Some random comments on 0001:
=== 1
It does not compiles "alone". It's missing:
+++ b/src/include/storage/pg_shmem.h
@@ -45,6 +45,7 @@ typedef struct PGShmemHeader /* standard header for all Postgres shmem */
extern PGDLLIMPORT int shared_memory_type;
extern PGDLLIMPORT int huge_pages;
extern PGDLLIMPORT int huge_page_size;
+extern PGDLLIMPORT int huge_pages_status;
and
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -563,7 +563,7 @@ static int ssl_renegotiation_limit;
*/
int huge_pages = HUGE_PAGES_TRY;
int huge_page_size;
-static int huge_pages_status = HUGE_PAGES_UNKNOWN;
+int huge_pages_status = HUGE_PAGES_UNKNOWN;
That come with 0002. So it has to be in 0001 instead.
=== 2
+else
+ as_fn_error $? "header file <numa.h> is required for --with-libnuma" "$LINENO" 5
+fi
Maybe we should add the same test (checking for numa.h) for meson?
=== 3
+# FIXME: filter-out / with/without with_libnuma?
+LIBS += $(LIBNUMA_LIBS)
It looks to me that we can remove those 2 lines.
=== 4
+ Only supported on Linux.
s/on Linux/on platforms for which the libnuma library is implemented/?
I did a quick grep on "Only supported on" and it looks like that could be
a more consistent wording.
=== 5
+#include "c.h"
+#include "postgres.h"
+#include "port/pg_numa.h"
+#include "storage/pg_shmem.h"
+#include <unistd.h>
I had a closer look to other header files and it looks like it "should" be:
#include "c.h"
#include "postgres.h"
#include <unistd.h>
#ifdef WIN32
#include <windows.h>
#endif
#include "port/pg_numa.h"
#include "storage/pg_shmem.h"
And is "#include "c.h"" really needed?
=== 6
+/* FIXME not tested, might crash */
That's a bit scary.
=== 7
+ * XXX: for now we issue just WARNING, but long-term that might depend on
+ * numa_set_strict() here
s/here/here./
Did not look carefully at all the comments in 0001, 0002 and 0003 though.
A few random comments regarding 0002:
=== 8
# create extension pg_buffercache;
ERROR: could not load library "/home/postgres/postgresql/pg_installed/pg18/lib/pg_buffercache.so": /home/postgres/postgresql/pg_installed/pg18/lib/pg_buffercache.so: undefined symbol: pg_numa_query_pages
CONTEXT: SQL statement "CREATE FUNCTION pg_buffercache_pages()
RETURNS SETOF RECORD
AS '$libdir/pg_buffercache', 'pg_buffercache_pages'
LANGUAGE C PARALLEL SAFE"
extension script file "pg_buffercache--1.2.sql", near line 7
While that's ok if 0003 is applied. I think that each individual patch should
"fully" work individually.
=== 9
+ do
+ {
+ if (os_page_ptrs[blk2page + j] == 0)
blk2page + j will be repeated multiple times, better to store it in a local
variable instead?
=== 10
+ if (firstUseInBackend == true)
if (firstUseInBackend) instead?
=== 11
+ int j = 0,
+ blk2page = (int) i * pages_per_blk;
I wonder if size_t is more appropriate for blk2page:
size_t blk2page = (size_t)(i * pages_per_blk) maybe?
=== 12
as we know that we'll iterate until pages_per_blk then would a for loop be more
appropriate here, something like?
"
for (size_t j = 0; j < pages_per_blk; j++)
{
if (os_page_ptrs[blk2page + j] == 0)
{
"
=== 13
+ if (buf_state & BM_DIRTY)
+ fctx->record[i].isdirty = true;
+ else
+ fctx->record[i].isdirty = false;
fctx->record[i].isdirty = (buf_state & BM_DIRTY) != 0 maybe?
=== 14
+ if ((buf_state & BM_VALID) && (buf_state & BM_TAG_VALID))
+ fctx->record[i].isvalid = true;
+ else
+ fctx->record[i].isvalid = false;
fctx->record[i].isvalid = ((buf_state & BM_VALID) && (buf_state & BM_TAG_VALID))
maybe?
=== 15
+populate_buffercache_entry(int i, BufferCachePagesContext *fctx)
I wonder if "i" should get a more descriptive name?
=== 16
s/populate_buffercache_entry/pg_buffercache_build_tuple/? (to be consistent
with pg_stat_io_build_tuples for example).
I now realize that I did propose populate_buffercache_entry() up-thread,
sorry for changing my mind.
=== 17
+ static bool firstUseInBackend = true;
maybe we should give it a more descriptive name?
Also I need to think more about how firstUseInBackend is used, for example:
==== 17.1
would it be better to define it as a file scope variable? (at least that
would avoid to pass it as an extra parameter in some functions).
=== 17.2
what happens if firstUseInBackend is set to false and later on the pages
are moved to different NUMA nodes. Then pg_buffercache_numa_pages() is
called again by a backend that already had set firstUseInBackend to false,
would that provide accurate information?
=== 18
+Datum
+pg_buffercache_numa_pages(PG_FUNCTION_ARGS)
+{
+ FuncCallContext *funcctx;
+ BufferCachePagesContext *fctx; /* User function context. */
+ bool query_numa = true;
I don't think we need query_numa anymore in pg_buffercache_numa_pages().
I think that we can just ERROR (or NOTICE and return) here:
+ if (pg_numa_init() == -1)
+ {
+ elog(NOTICE, "libnuma initialization failed or NUMA is not supported on this platform, some NUMA data might be unavailable.");
+ query_numa = false;
and fully get rid of query_numa.
=== 19
And then here:
for (i = 0; i < NBuffers; i++)
{
populate_buffercache_entry(i, fctx);
if (query_numa)
pg_buffercache_numa_prepare_ptrs(i, pages_per_blk, os_page_size, os_page_ptrs, firstUseInBackend);
}
if (query_numa)
{
if (pg_numa_query_pages(0, os_page_count, os_page_ptrs, os_pages_status) == -1)
elog(ERROR, "failed NUMA pages inquiry: %m");
for (i = 0; i < NBuffers; i++)
{
int blk2page = (int) i * pages_per_blk;
/*
* Technically we can get errors too here and pass that to
* user. Also we could somehow report single DB block spanning
* more than one NUMA zone, but it should be rare.
*/
fctx->record[i].numa_zone_id = os_pages_status[blk2page];
}
maybe we can just loop a single time over "for (i = 0; i < NBuffers; i++)"?
A few random comments regarding 0003:
=== 20
+ <indexterm zone="view-pg-shmem-numa-allocations">
+ <primary>pg_shmem_numa_allocations</primary>
+ </indexterm>
+
+ <para>
+ The <structname>pg_shmem_allocations</structname> view shows NUMA nodes
s/pg_shmem_allocations/pg_shmem_numa_allocations/?
=== 21
+ /* FIXME: should we release LWlock here ? */
+ elog(ERROR, "failed NUMA pages inquiry status: %m");
There is no need, see src/backend/storage/lmgr/README.
=== 22
+#define MAX_NUMA_ZONES 32 /* FIXME? */
+ Size zones[MAX_NUMA_ZONES];
could we rely on pg_numa_get_max_node() instead?
=== 23
+ if (s >= 0)
+ zones[s]++;
should we also check that s is below a limit?
=== 24
Regarding how we make use of pg_numa_get_max_node(), are we sure there is
no possible holes? I mean could a system have node 0,1 and 3 but not 2?
Also I don't think I'm a Co-author, I think I'm just a reviewer (even if I
did a little in 0001 though)
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
From | Date | Subject | |
---|---|---|---|
Next Message | Bykov Ivan | 2025-03-10 10:17:45 | RE: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET |
Previous Message | Steven Niu | 2025-03-10 10:07:34 | Re: [Patch] remove duplicated smgrclose |