From: | Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com> |
---|---|
To: | Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com> |
Cc: | Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Draft for basic NUMA observability |
Date: | 2025-02-27 15:34:25 |
Message-ID: | Z8CGAdTbRzKXo0dq@ip-10-97-1-34.eu-west-3.compute.internal |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On Thu, Feb 27, 2025 at 10:05:46AM +0100, Jakub Wartak wrote:
> On Wed, Feb 26, 2025 at 6:13 PM Bertrand Drouvot
> <bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
> [..]
> > > Meanwhile v5 is attached with slight changes to try to make cfbot happy:
> >
> > Thanks for the updated version!
> >
> > FWIW, I had to do a few changes to get an error free compiling experience with
> > autoconf/or meson and both with or without the libnuma configure option.
> >
> > Sharing here as .txt files:
> >
> > Also the pg_buffercache test fails without the libnuma configure option. Maybe
> > some tests should depend of the libnuma configure option.
> [..]
>
> Thank you so much for this Bertrand !
>
> I've applied those , played a little bit with configure and meson and
> reproduced the test error and fixed it by silencing that NOTICE in
> tests. So v6 is attached even before I get a chance to start using
> that CI. Still waiting for some input and tests regarding that earlier
> touchpages attempt, docs are still missing...
Thanks for the new version!
I did some tests and it looks like it's giving correct results. I don't see -2
anymore and every backend reports correct distribution (with or without hp,
with "small" or "large" shared buffer).
A few random comments:
=== 1
+ /*
+ * This is for gathering some NUMA statistics. We might be using
+ * various DB block sizes (4kB, 8kB , .. 32kB) that end up being
+ * allocated in various different OS memory pages sizes, so first we
+ * need to understand the OS memory page size before calling
+ * move_pages()
+ */
+ os_page_size = pg_numa_get_pagesize();
+ os_page_count = ((uint64)NBuffers * BLCKSZ) / os_page_size;
+ pages_per_blk = (float) BLCKSZ / os_page_size;
+
+ elog(DEBUG1, "NUMA os_page_count=%d os_page_size=%ld pages_per_blk=%f",
+ os_page_count, os_page_size, pages_per_blk);
+
+ os_page_ptrs = palloc(sizeof(void *) * os_page_count);
+ os_pages_status = palloc(sizeof(int) * os_page_count);
+ memset(os_page_ptrs, 0, sizeof(void *) * os_page_count);
+
+ /*
+ * If we ever get 0xff back from kernel inquiry, then we probably have
+ * bug in our buffers to OS page mapping code here
+ */
+ memset(os_pages_status, 0xff, sizeof(int) * os_page_count);
I think that if (query_numa) check should also wrap that entire section of code.
=== 2
+ if (query_numa)
+ {
+ blk2page = (int) i * pages_per_blk;
+ j = 0;
+ do
+ {
This check is done for every page. I wonder if it would not make sense
to create a brand new function for pg_buffercache_numa and just let the
current pg_buffercache_pages() as it is. That said it would be great to avoid
code duplication as much a possible though, maybe using a shared
populate_buffercache_entry() or such helper function?
=== 3
+#define ONE_GIGABYTE 1024*1024*1024
+ if ((i * os_page_size) % ONE_GIGABYTE == 0)
+ CHECK_FOR_INTERRUPTS();
+ }
Did you observe noticable performance impact if calling CHECK_FOR_INTERRUPTS()
for every page instead? (I don't see with a 30GB shared buffer). I've the
feeling that we could get rid of the "ONE_GIGABYTE" check.
=== 4
+ pfree(os_page_ptrs);
+ pfree(os_pages_status);
Not sure that's needed, we should be in a short-lived memory context here
(ExprContext or such).
=== 5
+/* SQL SRF showing NUMA zones for allocated shared memory */
+Datum
+pg_get_shmem_numa_allocations(PG_FUNCTION_ARGS)
+{
That's a good idea.
+ for (i = 0; i < shm_ent_page_count; i++)
+ {
+ /*
+ * In order to get reliable results we also need to touch memory
+ * pages so that inquiry about NUMA zone doesn't return -2.
+ */
+ volatile uint64 touch pg_attribute_unused();
+
+ page_ptrs[i] = (char *) ent->location + (i * os_page_size);
+ pg_numa_touch_mem_if_required(touch, page_ptrs[i]);
That sounds right.
Could we also avoid some code duplication with pg_get_shmem_allocations()?
Also same remarks about pfree() and ONE_GIGABYTE as above.
A few other things:
==== 6
+++ b/src/backend/storage/ipc/shmem.c
@@ -73,6 +73,7 @@
#include "storage/shmem.h"
#include "storage/spin.h"
#include "utils/builtins.h"
+#include "port/pg_numa.h"
Not at the right position, should be between those 2:
#include "miscadmin.h"
#include "storage/lwlock.h"
==== 7
+/*-------------------------------------------------------------------------
+ *
+ * pg_numa.h
+ * Miscellaneous functions for bit-wise operations.
description is not correct. Also the "Copyright (c) 2019-2025" might be
"Copyright (c) 2025" instead.
=== 8
+++ b/src/port/pg_numa.c
@@ -0,0 +1,150 @@
+/*-------------------------------------------------------------------------
+ *
+ * numa.c
+ * Basic NUMA portability routines
s/numa.c/pg_numa.c/ ?
=== 9
+++ b/contrib/pg_buffercache/pg_buffercache_pages.c
@@ -6,6 +6,7 @@
* contrib/pg_buffercache/pg_buffercache_pages.c
*-------------------------------------------------------------------------
*/
+#include "pg_config.h"
#include "postgres.h"
Is this new include needed?
#include "access/htup_details.h"
@@ -13,10 +14,12 @@
#include "funcapi.h"
#include "storage/buf_internals.h"
#include "storage/bufmgr.h"
+#include "port/pg_numa.h"
+#include "storage/pg_shmem.h"
not in the right order.
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
From | Date | Subject | |
---|---|---|---|
Next Message | Pavel Stehule | 2025-02-27 15:40:18 | Re: proposal - plpgsql - support standard syntax for named arguments for cursors |
Previous Message | Gilles Darold | 2025-02-27 15:33:31 | Re: proposal: plpgsql, new check for extra_errors - strict_expr_check |