Quick Links

Re: Draft for basic NUMA observability

From:	Tomas Vondra <tomas(at)vondra(dot)me>
To:	Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>
Cc:	Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Draft for basic NUMA observability
Date:	2025-04-04 14:40:28
Message-ID:	d68f3d67-0d62-449c-8762-18be447629bb@vondra.me
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 4/4/25 08:50, Bertrand Drouvot wrote:
> Hi,
>
> On Thu, Apr 03, 2025 at 08:53:57PM +0200, Tomas Vondra wrote:
>> On 4/3/25 15:12, Jakub Wartak wrote:
>>> On Thu, Apr 3, 2025 at 1:52 PM Tomas Vondra <tomas(at)vondra(dot)me> wrote:
>>>
>>>> ...
>>>>
>>>> So unless someone can demonstrate a use case where this would matter,
>>>> I'd not worry about it too much.
>>>
>>> OK, fine for me - just 3 cols for pg_buffercache_numa is fine for me,
>>> it's just that I don't have cycles left today and probably lack skills
>>> (i've never dealt with arrays so far) thus it would be slow to get it
>>> right... but I can pick up anything tomorrow morning.
>>>
>>
>> OK, I took a stab at reworking/simplifying this the way I proposed.
>> Here's v24 - needs more polishing, but hopefully enough to show what I
>> had in mind.
>>
>> It does these changes:
>>
>> 1) Drops 0002 with the pg_buffercache refactoring, because the new view
>> is not "extending" the existing one.
>
> I think that makes sense. One would just need to join on the pg_buffercache
> view to get more information about the buffer if needed.
>
> The pg_buffercache_numa_pages() doc needs an update though as I don't think that
> "+ The <function>pg_buffercache_numa_pages()</function> provides the same
> information as <function>pg_buffercache_pages()</function>" is still true.
>

Right, thanks for checking the docs.

>> 2) Reworks pg_buffercache_num to return just three columns, bufferid,
>> page_num and node_id. page_num is a sequence starting from 0 for each
>> buffer.
>
> +1 on the idea
>
>> 3) It now builds an array of records, with one record per buffer/page.
>>
>> 4) I realized we don't really need to worry about buffers_per_page very
>> much, except for logging/debugging. There's always "at least one page"
>> per buffer, even if an incomplete one, so we can do this:
>>
>> os_page_count = NBuffers * Max(1, pages_per_buffer);
>>
>> and then
>>
>> for (i = 0; i < NBuffers; i++)
>> {
>> for (j = 0; j < Max(1, pages_per_buffer); j++)
>
> That's a nice simplification as we always need to take care of at least one page
> per buffer.
>

OK. I think I'll consider moving some of this code "building" the
entries into a separate function, to keep the main function easier to
understand.

>> and everything just works fine, I think.
>
> I think the same.
>
>> Opinions? I personally find this much cleaner / easier to understand.
>
> I agree that's easier to understand and that that looks correct.
>
> A few random comments:
>
> === 1
>
> It looks like that firstNumaTouch is not set to false anymore.
>

Damn, my mistake.

> === 2
>
> + pfree(os_page_status);
> + pfree(os_page_ptrs);
>
> Not sure that's needed, we should be in a short-lived memory context here
> (ExprContext or such).
>

Yeah, maybe. It's not allocated in the multi-call context, but I wasn't
sure. Will check.

> === 3
>
> + ro_volatile_var = *(uint64 *)ptr
>
> space missing before "ptr"?
>

Interesting the pgindent didn't tweak this.

regards

--
Tomas Vondra

In response to

Re: Draft for basic NUMA observability at 2025-04-04 06:50:07 from Bertrand Drouvot

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Nathan Bossart	2025-04-04 14:47:37	Re: Extend ALTER DEFAULT PRIVILEGES for large objects
Previous Message	Andres Freund	2025-04-04 14:39:53	Re: Replace IN VALUES with ANY in WHERE clauses during optimization