Re: Draft for basic NUMA observability

From: Tomas Vondra <tomas(at)vondra(dot)me>
To: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>
Cc: Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Draft for basic NUMA observability
Date: 2025-04-07 19:51:17
Message-ID: c0d02e4e-6eeb-47d9-9971-f65aa7264ab4@vondra.me
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 4/7/25 20:11, Bertrand Drouvot wrote:
> Hi,
>
> On Mon, Apr 07, 2025 at 12:42:21PM -0400, Andres Freund wrote:
>> Hi,
>>
>> On 2025-04-07 18:36:24 +0200, Tomas Vondra wrote:
>>
>> I was thinking of checking if the BufferDesc indicates BM_VALID or
>> BM_TAG_VALID.
>
> Yeah, that's what I did propose in [1] (when we were speaking about get_mempolicy())
> and I think that would make sense as future improvement.
>
>>
>>
>>> I think we need to decide whether the current patches are good enough
>>> for PG18, with the current behavior, and then maybe improve that in
>>> PG19.
>>
>> I think as long as the docs mention this with <note> or <warning> it's ok for
>> now.
>
> +1
>
> A few comments on v27:
>
> === 1
>
> pg_buffercache_numa() reports the node ID as "nodeid" while pg_shmem_allocations_numa()
> reports it as node_id. Maybe we should use the same "naming" in both.
>

This was renamed in v28 to "numa_node" in both parts.

> === 2
>
> postgres=# select count(*) from pg_buffercache;
> count
> -------
> 65536
> (1 row)
>
> but
>
> postgres=# select count(*) from pg_buffercache_numa;
> count
> -------
> 64
> (1 row)
>
> with:
>
> postgres=# show block_size;
> block_size
> ------------
> 2048
>
> and Hugepagesize: 2048 kB.
>
> and
>
> postgres=# show shared_buffers;
> shared_buffers
> ----------------
> 128MB
> (1 row)
>
> And even if for testing I set:
>
> - funcctx->max_calls = idx;
> + funcctx->max_calls = 65536;
>
> then I start to see weird results:
>
> postgres=# select count(*) from pg_buffercache_numa where bufferid not in (select bufferid from pg_buffercache);
> count
> -------
> 65472
> (1 row)
>
> So it looks like that the new way to iterate on the buffers that has been introduced
> in v26/v27 has some issue?
>

Yeah, the calculations of the end pointers were wrong - we need to round
up (using TYPEALIGN()) when calculating number of pages, and just add
BLCKSZ (without any rounding) when calculating end of buffer. The 0004
fixes this for me (I tried this with various blocksizes / page sizes).

Thanks for noticing this!

regards

--
Tomas Vondra

Attachment Content-Type Size
v29-0001-Add-support-for-basic-NUMA-awareness.patch text/x-patch 22.1 KB
v29-0002-Introduce-pg_shmem_allocations_numa-view.patch text/x-patch 18.9 KB
v29-0003-Add-pg_buffercache_numa-view-with-NUMA-node-info.patch text/x-patch 22.0 KB
v29-0004-fixup.patch text/x-patch 1.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2025-04-07 19:52:06 Re: Support NOT VALID / VALIDATE constraint options for named NOT NULL constraints
Previous Message Hannu Krosing 2025-04-07 19:48:20 Re: Adding pg_dump flag for parallel export to pipes